Hi team, I am currently testing Meltano as an opt...
# troubleshooting
a
Hi team, I am currently testing Meltano as an option for our EL(T) - from Mongo into Snowflake. I was able to test it out locally and things looked fine. Next steps were to add meltano to our k8s cluster using helm and argocd. I am using this Repository of Helm Charts for use with Meltano from: https://github.com/meltano/helm-meltano for the same. What I have done so far: •
cd <path-to-meltano-folder-on-our-k8s-infra-repo>
helm repo add meltano <https://meltano.gitlab.io/infra/helm-meltano/meltano-ui>
helm search repo meltano-ui
◦ This returns: ▪︎
meltano/meltano-ui   0.3.0   1.96.0   Meltano UI webservice
• So, I do a
helm pull meltano/meltano-ui
followed by
tar -xvzf meltano-ui-0.3.0.tgz
(and delete the tar file) • After this I add the meltano app to argocd and finally run
kubectl apply -f meltano.yml
After pushing the changes, I see my pod in argocd but with an error that says:
Copy code
`meltano ui` must be run inside a Meltano project.
Use `meltano init <project_directory>` to create one.
So, I went into the deployment.yaml file and changed:
Copy code
command: ["meltano", "ui"]
to
Copy code
command: ["/bin/sh","-c"]
args:
  - meltano init mongo-snowflake-sync;
    cd mongo-snowflake-sync;
    echo pwd;
    pwd;
    find .;
and also updated the mount path to:
mountPath: "/project/mongo-snowflake-sync/.env"
because “pwd” said the my current directory was
/project/mongo-snowflake-sync/
Also, under the values.yaml file, I updated the database URI like so:
Copy code
database_uri: "sqlite:///project/mongo-snowflake-sync/.meltano/meltano.db"
Now, I see this error:
Copy code
Could not connect to the database after 3 attempts. Max retries exceeded.
Need help fixing this problem? Visit <http://melta.no/> for troubleshooting steps, or to
join our friendly Slack community.
(sqlite3.OperationalError) unable to open database file
(Background on this error at: <https://sqlalche.me/e/14/e3q8>)
OR
Copy code
(sqlite3.OperationalError) unable to open database file
(Background on this error at: <https://sqlalche.me/e/14/e3q8>)
I tried multiple different options for the database URI and mount path with slightly varying results. For instance, I tried adding a “find .” to the commands to check all the files that were created in the meltano init step and while sometimes I could see the meltano.db most other times I did not(this might be a k8s thing though because I cannot reproduce that scenario where the meltano.db was getting listed in the find command 🤷). However, in all cases, I did still get the same error as above i.e. Could not connect to the database after 3 attempts. Max retries exceeded. I feel pretty much stuck at this point and out of ideas. Would appreciate any help/suggestions! TIA 🙏
v
Not certain about everything, but I know that version
1.96.0
is old, and I know the database connect issue was fixed after that version, so if you could upgrade meltano maybe you'd be off to the races?
w
CC @ken_payne
a
@visch That is interesting to know but I am not sure how to add a version to this repo. When I run:
helm search repo meltano-ui
(or meltano), I only get this one version:
NAME                    CHART VERSION   APP VERSION     DESCRIPTION
meltano/meltano-ui      0.3.0           1.96.0          Meltano UI webservice
Seems like this version needs to be updated: https://github.com/meltano/helm-meltano/blob/704ad4bb0adbd1c2b3f34f1e215c96571133cdf5/meltano-ui/Chart.yaml#L24 ?
v
Possibly, but the helm chart I would imagine* should add a DB (that's not sqllite) to the project so it could be something else too @Will Da Silva (Arch)’s tag is right here
k
Thanks for checking this out @abhinav_prakash 🙏 Both the Meltano image to use and the database URI are configurable, via the normal
values.yaml
route for Helm. Here are the defaults, which is where the SQLite URI is coming from. In the squared project (our internal deployment of Mletano) we substitute the docker `repository`/`tag` for custom values with our project installed in it and the
database_uri
for a running RDS Postgres instance (different for prod/staging) managed by Terraform. You can do the same with your project 🙂
a
@ken_payne thanks for the response thankyou Couldn't get to test it out as yet, driving for most of today. But I'll try to implement what you suggested and reach out in case I get stuck.
Hey @ken_payne, Sorry for the late response on this, I was traveling the past week. I did try setting up meltano the way you suggested and was able to get it to work(for the most part). As of now, I have a meltano install that is running version 2.8.0. I have added the following taps: • Extractor: tap-mongodb(variant: singer-io) ◦ Runs fine in discovery mode ◦ Able to access oplog table in mongodb for oplog replication • Loader: tap-snowflake(variant: transferwise, File format: CSV) ◦ Able to connect to snowflake(queries showing up in the SF history) ISSUE: • After the pipeline has read data from mongo, tap-snowflake runs the following create table statement: ◦
CREATE TABLE IF NOT EXISTS MELTANO."TABLENAME" (PRIMARY KEY("_ID")) data_retention_time_in_days = 1
, and fails with the error message: ▪︎
SQL compilation error: error line 1 at position 66 invalid identifier '_ID'
▪︎ I manually ran a create statement
CREATE TABLE IF NOT EXISTS MELTANO.TABLENAME (_ID varchar)
in SF(manually, as the SF user above) and was able to create a table, so the user has the correct perms to read/create table ▪︎ HOWEVER, this statement fails: •
CREATE TABLE IF NOT EXISTS MELTANO.TABLENAME ('_ID' varchar)
(OR “_ID”) • So I am wondering if it is the extra quotes are getting added in the snowflake extractor that should not be.(Also checked the SF doc and
ID
is not a reserved keyword) • I have already tried to load the output to json and csv files and both worked fine. TIA, Abhinav
Update: One of the other things I tried was setting up the tap-mongodb with the transferwise variant(considering that the snowflake tap is transferwise). However, with that variant I am not even able to connect to my mongodb. Even though I am using the same uri/user/pass/port as that for the singer-io variant(which works). So haven’t really been able to test out that route. Would really appreciate some help on this 🙏
d
@abhinav_prakash The double quotes around
_ID
seem fine: https://docs.snowflake.com/en/sql-reference/identifiers-syntax.html#label-delimited-identifier. But you’re saying
CREATE TABLE IF NOT EXISTS MELTANO.TABLENAME ("_ID" varchar)
fails with that same “invalid identifier” error?
Does the create table query target-snowflake runs (
CREATE TABLE IF NOT EXISTS MELTANO."TABLENAME" (PRIMARY KEY("_ID")) data_retention_time_in_days = 1
) really not list any specific columns and types? I wonder if the issue is that it references
_ID
to be the primary key, but doesn’t actually specify how to create that column
a
Yea that is what I was wondering and found the github link where you mentioned about explicitly specifying each schema in the meltano.yml file. I am about to test that now.
About your other question, I just tried running this statement `CREATE TABLE IF NOT EXISTS MELTANO.TABLENAME ("_ID" varchar)`and it ran successfully.