Hello Team, I'm using meltano (local - not docker)...
# troubleshooting
p
Hello Team, I'm using meltano (local - not docker) UI to sync data from MongoDB to Redshift. I searched in UI, but not found a way to pick the entities and choose replication methods. So I started following the documents of CLI. I ran the pipeline (tap - MongoDB and target - Redshift), which I configured in UI. But It ended up with the below error.
target-redshift | psycopg2.errors.UndefinedColumn: column "_id" named in key does not exist
After analyzing, found that the tap produced below schema
{"type": "SCHEMA", "stream": "mongodb_document", "schema": {"type": "object"}, "key_properties": ["_id"]}
and, target-redshift tries to create a table with this schema. Since "_id" is mentioned key, it searches for this field under the schema and gets failed. Question - Is it possible to manually update the schema? In singer, there would be catalog.json. In meltano documentation, it is mentioned as it will be handled internally. Could you please let me know if there is any way to update the schema?
t
From https://hub.meltano.com/singer/spec#schemas you can set
key_properties
to an empty list as well to indicate there's no primary key.
f
I met the same issue and solved as below: 1. Override schema with extractor's
schema
extra (https://meltano.com/docs/integration.html#overriding-schemas), then meltano will generate correct catalog.json. 2. The
tap-mongodb
default variant is out of maintenance, it always generate schema from data rows and doesn't follow the schema in catalog.json. So I forked and modified
tap-mongodb
to use specified schema.