helder_rossa
03/17/2021, 5:59 PMmeltano invoke tap-mongodb -d
2. Not sure what to do with the command `meltano select tap-mongodb --list --all`because does not show much information
3. I’ve tried `meltano elt tap-mongodb target-postgres`but not much success. It does connect but the Sync Summary
is always empty.
Can anyone help get this ‘final’ steps done? Thanks!!helder_rossa
03/17/2021, 6:40 PMmeltano | Incremental state has been updated at 2021-03-17 18:37:55.095300.
tap-mongodb | INFO Must complete full table sync before starting oplog replication for eattasty-prd-Allergie
tap-mongodb | INFO Starting full table sync for eattasty-prd-Allergie
meltano | Incremental state has been updated at 2021-03-17 18:37:55.167207.
target-postgres | ERROR Allergie - Table for stream does not exist
tap-mongodb | INFO Querying eattasty-prd-Allergie with:
tap-mongodb | Find Parameters: {'$lte': 'sulphites'}
tap-mongodb | INFO Syncd 14 records for eattasty-prd-Allergie
tap-mongodb | INFO Starting oplog sync for eattasty-prd-Allergie
tap-mongodb | INFO Querying eattasty-prd-Allergie with:
tap-mongodb | Find Parameters: {'ts': {'$gte': Timestamp(1616006266, 1)}}
tap-mongodb | Projection: {'ts': 1, 'ns': 1, 'op': 1, 'o2': 1, 'o': 1}
tap-mongodb | oplog_replay: True
target-postgres | INFO Stream Allergie (allergie) with max_version 1616006275161 targetting 1616006275161
target-postgres | INFO Root table name Allergie
target-postgres | INFO Writing batch with 14 records for `Allergie` with `key_properties`: `['_id']`
target-postgres | INFO METRIC: {"type": "counter", "metric": "record_count", "value": 0, "tags": {"count_type": "batch_rows_persisted", "path": ["Allergie"], "database": "postgres", "schema": "test"}}
target-postgres | INFO METRIC: {"type": "timer", "metric": "job_duration", "value": 0.0007932186126708984, "tags": {"job_type": "batch", "path": ["Allergie"], "database": "postgres", "schema": "test", "status": "failed"}}
target-postgres | ERROR Exception writing records
target-postgres | Traceback (most recent call last):
target-postgres | File "/Users/kimus/Develop/eattasty/meltano/mongo2pg/mongo2pg/.meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/postgres.py", line 295, in write_batch
...
target-postgres | File "/Users/kimus/Develop/eattasty/meltano/mongo2pg/mongo2pg/.meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/postgres.py", line 309, in write_batch
target-postgres | raise PostgresError(message, ex)
target-postgres | target_postgres.exceptions.PostgresError: ('Exception writing records', KeyError('_id'))
meltano | Loading failed (1): target_postgres.exceptions.PostgresError: ('Exception writing records', KeyError('_id'))
meltano | ELT could not be completed: Loader failed
ELT could not be completed: Loader failed
douwe_maan
03/17/2021, 8:29 PM_id
column? Did you make sure this column is selected and extracted? The KeyError('_id')
we see in the logs suggests that the tap is telling the target to use the _id
column as the primary key, but the key is actually missing from the extracted recordshelder_rossa
03/17/2021, 8:31 PMdouwe_maan
03/17/2021, 8:34 PMmeltano --log-level=debug
so that we can see all of the SCHEMA and RECORD messages printed?douwe_maan
03/17/2021, 8:35 PMdouwe_maan
03/17/2021, 8:36 PMhelder_rossa
03/17/2021, 8:36 PMdouwe_maan
03/17/2021, 8:37 PMhelder_rossa
03/17/2021, 8:37 PMdouwe_maan
03/17/2021, 8:37 PMhelder_rossa
03/17/2021, 8:38 PMloaders:
- name: target-postgres
variant: datamill-co
pip_url: singer-target-postgres
helder_rossa
03/17/2021, 8:38 PMhelder_rossa
03/17/2021, 8:39 PMdouwe_maan
03/17/2021, 8:39 PMdouwe_maan
03/17/2021, 8:40 PMhelder_rossa
03/17/2021, 8:40 PMdouwe_maan
03/17/2021, 8:40 PMtap-mongodb (out) | {"type": "SCHEMA", "stream": "Allergie", "schema": {"type": "object"}, "key_properties": ["_id"]}
helder_rossa
03/17/2021, 8:41 PMhelder_rossa
03/17/2021, 8:41 PMdouwe_maan
03/17/2021, 8:42 PMkey_properties
(_id
) to actually exist inside the schema
object, which is empty in this case 😬douwe_maan
03/17/2021, 8:42 PMhelder_rossa
03/17/2021, 8:42 PMhelder_rossa
03/17/2021, 8:43 PMdouwe_maan
03/17/2021, 8:43 PMhelder_rossa
03/17/2021, 8:44 PMhelder_rossa
03/17/2021, 8:44 PMhelder_rossa
03/17/2021, 8:44 PMdouwe_maan
03/17/2021, 8:44 PMhelder_rossa
03/17/2021, 8:45 PMdouwe_maan
03/17/2021, 8:45 PMhelder_rossa
03/17/2021, 8:46 PMhelder_rossa
03/17/2021, 8:48 PMdouwe_maan
03/17/2021, 8:49 PMhelder_rossa
03/17/2021, 8:49 PMhelder_rossa
03/17/2021, 8:52 PM- name: tap-mongodb
variant: pipelinewise
pip_url: git+<https://github.com/transferwise/pipelinewise-tap-mongodb>
do I need to do meltano install --custom … instead?douwe_maan
03/17/2021, 8:52 PMhelder_rossa
03/17/2021, 8:53 PMhelder_rossa
03/17/2021, 8:59 PMdouwe_maan
03/17/2021, 9:00 PMhelder_rossa
03/17/2021, 9:00 PMhelder_rossa
03/17/2021, 9:00 PMhelder_rossa
03/17/2021, 9:04 PMdouwe_maan
03/17/2021, 9:04 PMhelder_rossa
03/17/2021, 9:05 PMhelder_rossa
03/17/2021, 9:05 PMdouwe_maan
03/17/2021, 9:05 PMhelder_rossa
03/17/2021, 9:06 PM[automatic] eattasty-prd-Address._id
[automatic] eattasty-prd-Address._sdc_deleted_at
[automatic] eattasty-prd-Address.document
helder_rossa
03/17/2021, 9:06 PMdouwe_maan
03/17/2021, 9:06 PMdouwe_maan
03/17/2021, 9:07 PMdouwe_maan
03/17/2021, 9:07 PMdocument
object, which may end up being a single jsonb
document
column once loaded with pipelinewise-target-postgreshelder_rossa
03/17/2021, 9:08 PMhelder_rossa
03/17/2021, 9:08 PMhelder_rossa
03/17/2021, 9:08 PM{
"table_name": "User",
"stream": "User",
"metadata": [
{
"breadcrumb": [],
"metadata": {
"table-key-properties": [
"_id"
],
"database-name": "eattasty-prd",
"row-count": 57089,
"is-view": false,
"valid-replication-keys": [
"_id",
"email"
]
}
}
],
"tap_stream_id": "eattasty-prd-User",
"schema": {
"type": "object",
"properties": {
"_id": {
"type": [
"string",
"null"
]
},
"document": {
"type": [
"object",
"array",
"string",
"null"
]
},
"_sdc_deleted_at": {
"type": [
"string",
"null"
]
}
}
}
}
douwe_maan
03/17/2021, 9:09 PMhelder_rossa
03/17/2021, 9:09 PMhelder_rossa
03/17/2021, 9:09 PMdouwe_maan
03/17/2021, 9:10 PMhelder_rossa
03/17/2021, 9:13 PMdouwe_maan
03/17/2021, 9:14 PMmeltano
variant of target-postgres
) which doesn't care about an empty initial SCHEMAhelder_rossa
03/17/2021, 9:15 PMdouwe_maan
03/17/2021, 9:15 PMhelder_rossa
03/17/2021, 9:16 PMhelder_rossa
03/17/2021, 9:17 PMhelder_rossa
03/17/2021, 9:18 PMhelder_rossa
03/17/2021, 9:18 PMdouwe_maan
03/17/2021, 9:22 PMhelder_rossa
03/17/2021, 10:35 PMdouwe_maan
03/17/2021, 10:35 PMhelder_rossa
03/17/2021, 10:36 PMdouwe_maan
03/17/2021, 10:37 PMhelder_rossa
03/17/2021, 10:40 PMdouwe_maan
03/17/2021, 10:41 PMhelder_rossa
03/17/2021, 10:44 PMreturn {
'table_name': collection_name,
'stream': collection_name,
'metadata': metadata.to_list(mdata),
'tap_stream_id': "{}-{}".format(collection_db_name, collection_name),
'schema': {
'type': 'object'
}
}
helder_rossa
03/17/2021, 10:45 PMhelder_rossa
03/17/2021, 10:45 PMdouwe_maan
03/17/2021, 10:46 PMSCHEMA
messages based on sampling, and that logic should really be used in discovery mode as well, and in the very first SCHEMA
messagehelder_rossa
03/17/2021, 10:47 PMdouwe_maan
03/17/2021, 10:48 PMhelder_rossa
03/17/2021, 10:48 PMdouwe_maan
03/17/2021, 10:49 PMhelder_rossa
03/17/2021, 10:50 PMhelder_rossa
03/17/2021, 10:55 PMhelder_rossa
03/17/2021, 11:03 PMhelder_rossa
03/17/2021, 11:31 PMdouwe_maan
03/17/2021, 11:32 PMhelder_rossa
03/17/2021, 11:33 PMhelder_rossa
03/17/2021, 11:33 PM'schema': {
'type': 'object',
'properties': {
"_id": { 'type': ['null', 'string'] }
}
}
douwe_maan
03/17/2021, 11:34 PMhelder_rossa
03/17/2021, 11:34 PMdouwe_maan
03/17/2021, 11:34 PMhelder_rossa
03/17/2021, 11:35 PMdouwe_maan
03/17/2021, 11:35 PMSCHEMA
message, with only the _id
field, and ignore any SCHEMA
messages that follow, as well as any fields with other names in RECORD
messageshelder_rossa
03/17/2021, 11:35 PMdouwe_maan
03/17/2021, 11:36 PMSCHEMA
message, but tap-mongodb
is not following that rulehelder_rossa
03/17/2021, 11:36 PMdouwe_maan
03/17/2021, 11:36 PMhelder_rossa
03/17/2021, 11:36 PMhelder_rossa
03/17/2021, 11:37 PMdouwe_maan
03/17/2021, 11:37 PMdouwe_maan
03/17/2021, 11:37 PMhelder_rossa
03/17/2021, 11:37 PMdouwe_maan
03/17/2021, 11:37 PMhelder_rossa
03/17/2021, 11:38 PMdouwe_maan
03/17/2021, 11:38 PMhelder_rossa
03/17/2021, 11:38 PMdouwe_maan
03/17/2021, 11:38 PMdouwe_maan
03/17/2021, 11:39 PMdouwe_maan
03/17/2021, 11:39 PMhelder_rossa
03/17/2021, 11:40 PMtap-mongodb (out) | {"type": "SCHEMA", "stream": "Allergie", "schema": {"properties": {"_id": {"type": ["null", "string"]}}, "type": "object"}, "key_properties": ["_id"]}
helder_rossa
03/17/2021, 11:40 PMdouwe_maan
03/17/2021, 11:41 PMdouwe_maan
03/17/2021, 11:41 PMhelder_rossa
03/17/2021, 11:42 PMhelder_rossa
03/17/2021, 11:43 PMdouwe_maan
03/17/2021, 11:43 PMlocales
property at some point, but not what was inside itdouwe_maan
03/17/2021, 11:43 PMjsonb
column, or some targets may denest it into a separate joined tablehelder_rossa
03/17/2021, 11:44 PMhelder_rossa
03/17/2021, 11:45 PMdouwe_maan
03/17/2021, 11:45 PMdouwe_maan
03/17/2021, 11:45 PMmeltano
variant does: https://github.com/meltano/target-postgres/blob/master/target_postgres/db_sync.py#L64helder_rossa
03/17/2021, 11:45 PMdouwe_maan
03/17/2021, 11:46 PMhelder_rossa
03/17/2021, 11:47 PMdouwe_maan
03/17/2021, 11:48 PM'anyOf': [{}]
it found for locales
with actual details abouts its properties. But that should "just" be a matter of running the current schema detection logic recursivelyhelder_rossa
03/17/2021, 11:49 PMif common.row_to_schema(schema, row):
singer.write_message(singer.SchemaMessage(
stream=common.calculate_destination_stream_name(stream),
schema=schema,
key_properties=['_id']))
douwe_maan
03/17/2021, 11:50 PMhelder_rossa
03/17/2021, 11:50 PMhelder_rossa
03/17/2021, 11:53 PMhelder_rossa
03/17/2021, 11:53 PMdouwe_maan
03/17/2021, 11:54 PMtap-mongodb (out)
prefix indicating it's actual output going to the target? So the target indeed never gets the full schema?helder_rossa
03/17/2021, 11:54 PMdouwe_maan
03/17/2021, 11:55 PMhelder_rossa
03/17/2021, 11:56 PMhelder_rossa
03/17/2021, 11:58 PMhelder_rossa
03/17/2021, 11:59 PMdouwe_maan
03/18/2021, 12:01 AMhelder_rossa
03/18/2021, 12:03 AMtap-mongodb | INFO ++++
tap-mongodb | INFO {'type': 'object', 'properties': {}}
tap-mongodb (out) | {"type": "RECORD", "stream": "Allergie", "record": {"_id": "celery", "locales": {"0": {"lang": "en", "name": "celery free"}, "1": {"lang": "pt", "name": "sem aipo"}}}, "version": 1616023844615, "time_extracted": "2021-03-17T23:30:44.650529Z"}
tap-mongodb | INFO ++++
tap-mongodb | INFO {'type': 'object', 'properties': {'locales': {'anyOf': [{}]}}}
helder_rossa
03/18/2021, 12:03 AMhelder_rossa
03/18/2021, 12:05 AMdouwe_maan
03/18/2021, 12:05 AMhelder_rossa
03/18/2021, 12:06 AMhelder_rossa
03/18/2021, 12:06 AMdouwe_maan
03/18/2021, 12:07 AMhelder_rossa
03/18/2021, 12:14 AMdouwe_maan
03/18/2021, 12:14 AMhelder_rossa
03/18/2021, 12:21 AMdouwe_maan
03/18/2021, 12:21 AMhelder_rossa
03/18/2021, 10:36 AM{
"table_name": "Zone",
"stream": "Zone",
"metadata": [
{
"breadcrumb": [],
"metadata": {
"table-key-properties": [
"_id"
],
"database-name": "eattasty-prd",
"row-count": 199,
"is-view": false,
"valid-replication-keys": [
"_id"
]
}
}
],
"tap_stream_id": "eattasty-prd-Zone",
"schema": {
"type": "object",
"properties": {
"_id": {
"type": [
"null",
"string"
]
},
"delivery": {
"anyOf": [
{}
]
},
"coordinates": {
"anyOf": [
{
"type": "array",
"items": {
"anyOf": [
{
"type": "object",
"properties": {
"lat": {
"anyOf": [
{
"type": "number"
},
{}
]
},
"lng": {
"anyOf": [
{
"type": "number"
},
{}
]
}
}
},
{}
]
}
},
{}
]
}
}
}
},
helder_rossa
03/18/2021, 11:59 AMdouwe_maan
03/18/2021, 2:57 PMhelder_rossa
03/18/2021, 3:11 PM.args
for details.', [(<ValidationError: 'False is not valid under any of the given schemas'>, {'type': 'RECORD', 'stream': 'Order', 'record': {'_id': '5a9842308257…douwe_maan
03/18/2021, 3:14 PMtarget_postgres.exceptions.SingerStreamError: ('Invalid records detected above threshold: 0. See `.args` for details.', [(<ValidationError: 'False is not valid under any of the given schemas'>, {'type': 'RECORD', 'stream': 'Order', 'record': {'_id': '5a9842308257a200c3fa5e8b', 'orderdate': '2018-03-02T00:00:00.000000Z', 'modifieddate': '2018-03-02T12:49:25.978000Z', 'payment_status': 'PAID', 'status': 'DELIVERED', 'alerted': True, 'fail': False, 'fail_reason': 'NONE', 'customerId': '5a7840418d350100c22e6445', 'promocodes': ['5a9696bfc270f700c13b9d95'], 'discount': Decimal('5.9'), 'createddate': '2018-03-01T18:11:05.739000Z', 'cutlery': True, 'driverId': '59f70fd1acad7f856ccdfaed', 'delivered': '2018-03-02T12:49:25.978000Z', 'deliveryEnded': '2018-03-02T12:45:00.000000Z', 'areaId': '5d13407be54b0000cf7090b6', 'organizationId': '58f79b4e325a7145ba47e6ce', 'routeId': '58a2fe719874d1aa7e482e95', 'delivery': 'lunch'}, 'version': 1616080261266, 'time_extracted': '2021-03-18T15:11:01.303316Z', '__raw_line_size': 797})])
douwe_maan
03/18/2021, 3:14 PMhelder_rossa
03/18/2021, 3:15 PMhelder_rossa
03/18/2021, 3:15 PMhelder_rossa
03/18/2021, 3:16 PMhelder_rossa
03/18/2021, 3:17 PMtarget-postgres | INFO Writing table batch with 109874 rows for `('Order__1616080549258',)`...
douwe_maan
03/18/2021, 3:17 PMhelder_rossa
03/18/2021, 3:25 PMhelder_rossa
03/18/2021, 3:26 PMtarget_postgres.exceptions.SingerStreamError: ('Invalid records detected above threshold: 0. See `.args` for details.', [(<ValidationError: "{'reason': 'WRONG_DAY', 'observations': '', 'promoId': '5d66ee3b0807d200cc647cc5', 'promoCodeValue': '5,90'} is not valid under any of the given schemas">,
helder_rossa
03/18/2021, 3:31 PMhelder_rossa
03/18/2021, 3:32 PMdouwe_maan
03/18/2021, 3:33 PMhelder_rossa
03/18/2021, 3:35 PMhelder_rossa
03/19/2021, 5:31 PMandrew_stewart
03/30/2021, 4:40 AMandrew_stewart
03/30/2021, 4:41 AMtap-mongodb
, wish i had read this first!andrew_stewart
03/30/2021, 4:50 AM{
"database_url": "<mongodb+srv://user:myRealPassword@cluster0.mongodb.net/test?w=majority&tls=true>"
}
It also installs dnspython
, which looks like a necessary dependency for certain mongodb hosts (like Atlas)