Hi, I’m trying to use tap-mongodb, and I already g...
# plugins-general
h
Hi, I’m trying to use tap-mongodb, and I already get some steps done, but I’m stuck in the ‘selection’ and not sure what to do next. 1. I can connecto to database and ‘inspect’ database by using
meltano invoke tap-mongodb -d
2. Not sure what to do with the command `meltano select tap-mongodb --list --all`because does not show much information 3. I’ve tried `meltano elt tap-mongodb target-postgres`but not much success. It does connect but the
Sync Summary
is always empty. Can anyone help get this ‘final’ steps done? Thanks!!
After some trial and error I get something …
Copy code
meltano         | Incremental state has been updated at 2021-03-17 18:37:55.095300.
tap-mongodb     | INFO Must complete full table sync before starting oplog replication for eattasty-prd-Allergie
tap-mongodb     | INFO Starting full table sync for eattasty-prd-Allergie
meltano         | Incremental state has been updated at 2021-03-17 18:37:55.167207.
target-postgres | ERROR Allergie - Table for stream does not exist
tap-mongodb     | INFO Querying eattasty-prd-Allergie with:
tap-mongodb     | 	Find Parameters: {'$lte': 'sulphites'}
tap-mongodb     | INFO Syncd 14 records for eattasty-prd-Allergie
tap-mongodb     | INFO Starting oplog sync for eattasty-prd-Allergie
tap-mongodb     | INFO Querying eattasty-prd-Allergie with:
tap-mongodb     | 	Find Parameters: {'ts': {'$gte': Timestamp(1616006266, 1)}}
tap-mongodb     | 	Projection: {'ts': 1, 'ns': 1, 'op': 1, 'o2': 1, 'o': 1}
tap-mongodb     | 	oplog_replay: True
target-postgres | INFO Stream Allergie (allergie) with max_version 1616006275161 targetting 1616006275161
target-postgres | INFO Root table name Allergie
target-postgres | INFO Writing batch with 14 records for `Allergie` with `key_properties`: `['_id']`
target-postgres | INFO METRIC: {"type": "counter", "metric": "record_count", "value": 0, "tags": {"count_type": "batch_rows_persisted", "path": ["Allergie"], "database": "postgres", "schema": "test"}}
target-postgres | INFO METRIC: {"type": "timer", "metric": "job_duration", "value": 0.0007932186126708984, "tags": {"job_type": "batch", "path": ["Allergie"], "database": "postgres", "schema": "test", "status": "failed"}}
target-postgres | ERROR Exception writing records
target-postgres | Traceback (most recent call last):
target-postgres |   File "/Users/kimus/Develop/eattasty/meltano/mongo2pg/mongo2pg/.meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/postgres.py", line 295, in write_batch
...
target-postgres |   File "/Users/kimus/Develop/eattasty/meltano/mongo2pg/mongo2pg/.meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/postgres.py", line 309, in write_batch
target-postgres |     raise PostgresError(message, ex)
target-postgres | target_postgres.exceptions.PostgresError: ('Exception writing records', KeyError('_id'))
meltano         | Loading failed (1): target_postgres.exceptions.PostgresError: ('Exception writing records', KeyError('_id'))
meltano         | ELT could not be completed: Loader failed
ELT could not be completed: Loader failed
d
@helder_rossa Do your collections have an
_id
column? Did you make sure this column is selected and extracted? The
KeyError('_id')
we see in the logs suggests that the tap is telling the target to use the
_id
column as the primary key, but the key is actually missing from the extracted records
h
@douwe_maan this specific colleciton has a _id column of type string
d
@helder_rossa Can you run again with
meltano --log-level=debug
so that we can see all of the SCHEMA and RECORD messages printed?
I think the issue is that the SCHEMA message doesn't actually describe the fields
I looked into this a little while ago; you're seeing https://gitlab.com/meltano/meltano/-/issues/2517#note_489278113
h
Btw, I’m not sure if I did something wrong, but just configured some exclusions in the “select:” option.
d
Which target-postgres are you using? the transferwise variant? You may have better luck with the datamill-co or meltano variant (which I used in that issue), or with https://github.com/transferwise/pipelinewise-tap-mongodb instead of https://github.com/singer-io/tap-mongodb as https://github.com/singer-io/tap-mongodb/issues/48#issuecomment-758451877 suggests
h
And, I have used stitchdata service, with this database, and worked fine.
d
I don't think you're doing anything wrong
h
Copy code
loaders:
  - name: target-postgres
    variant: datamill-co
    pip_url: singer-target-postgres
This collection only has 14 records
so, It’s strange that id didn’t figured out the schema 😄
d
The issue appears to be that tap-mongodb sends one empty SCHEMA message before it starts sending real ones, and the target is tripping over that empty one
h
```tap-mongodb (out) | {"type": "STATE", "value": {"bookmarks": {"eattasty-prd-Allergie": {"last_replication_method": "LOG_BASED"}}, "currently_syncing": "eattasty-prd-Allergie"}} tap-mongodb (out) | {"type": "SCHEMA", "stream": "Allergie", "schema": {"type": "object"}, "key_properties": ["_id"]} target-postgres (out) | {"bookmarks": {"eattasty-prd-Allergie": {"last_replication_method": "LOG_BASED"}}, "currently_syncing": "eattasty-prd-Allergie"} meltano | INFO Incremental state has been updated at 2021-03-17 203949.421761. meltano | DEBUG Incremental state: {'bookmarks': {'eattasty-prd-Allergie': {'last_replication_method': 'LOG_BASED'}}, 'currently_syncing': 'eattasty-prd-Allergie'} tap-mongodb | INFO Must complete full table sync before starting oplog replication for eattasty-prd-Allergie tap-mongodb | INFO Starting full table sync for eattasty-prd-Allergie tap-mongodb (out) | {"type": "STATE", "value": {"bookmarks": {"eattasty-prd-Allergie": {"last_replication_method": "LOG_BASED", "oplog_ts_time": 1616013589, "oplog_ts_inc": 1, "version": 1616013589491}}, "currently_syncing": "eattasty-prd-Allergie"}} tap-mongodb (out) | {"type": "ACTIVATE_VERSION", "stream": "Allergie", "version": 1616013589491} target-postgres (out) | {"bookmarks": {"eattasty-prd-Allergie": {"last_replication_method": "LOG_BASED", "oplog_ts_time": 1616013589, "oplog_ts_inc": 1, "version": 1616013589491}}, "currently_syncing": "eattasty-prd-Allergie"} meltano | INFO Incremental state has been updated at 2021-03-17 203949.497529. meltano | DEBUG Incremental state: {'bookmarks': {'eattasty-prd-Allergie': {'last_replication_method': 'LOG_BASED', 'oplog_ts_time': 1616013589, 'oplog_ts_inc': 1, 'version': 1616013589491}}, 'currently_syncing': 'eattasty-prd-Allergie'} target-postgres | ERROR Allergie - Table for stream does not exist tap-mongodb | INFO Querying eattasty-prd-Allergie with: tap-mongodb | Find Parameters: {'$lte': 'sulphites'} tap-mongodb (out) | {"type": "RECORD", "stream": "Allergie", "record": {"_id": "celery", "locales": {"0": {"lang": "en", "name": "celery free"}, "1": {"lang": "pt", "name": "sem aipo"}}}, "version": 1616013589491, "time_extracted": "2021-03-17T203949.526536Z"} tap-mongodb (out) | {"type": "RECORD", "stream": "Allergie", "record": {"_id": "crustaceans", "locales": {"0": {"lang": "en", "name": "crustaceans free"}, "1": {"lang": "pt", "name": "sem crust\u00e1ceos"}}}, "version": 1616013589491, "time_extracted": "2021-03-17T203949.526536Z"} tap-mongodb (out) | {"type": "RECORD", "stream": "Allergie", "record": {"_id": "dairy", "locales": {"0": {"lang": "en", "name": "lactose free"}, "1": {"lang": "pt", "name": "sem lactose"}}}, "version": 1616013589491, "time_extracted": "2021-03-17T203949.526536Z"} tap-mongodb (out) | {"type": "RECORD", "stream": "Allergie", "record": {"_id": "egg", "locales": {"0": {"lang": "en", "name": "egg free"}, "1": {"lang": "pt", "name": "sem ovo"}}}, "version": 1616013589491, "time_extracted": "2021-03-17T203949.526536Z"} tap-mongodb (out) | {"type": "RECORD", "stream": "Allergie", "record": {"_id": "fish", "locales": {"0": {"lang": "en", "name": "fish free"}, "1": {"lang": "pt", "name": "sem peixe"}}}, "version": 1616013589491, "time_extracted": "2021-03-17T203949.526536Z"} tap-mongodb (out) | {"type": "RECORD", "stream": "Allergie", "record": {"_id": "gluten", "locales": {"0": {"lang": "en", "name": "gluten free"}, "1": {"lang": "pt", "name": "sem glutten"}}}, "version": 1616013589491, "time_extracted": "2021-03-17T203949.526536Z"} tap-mongodb (out) | {"type": "RECORD", "stream": "Allergie", "record": {"_id": "lupine", "locales": {"0": {"lang": "en", "name": "lupine free"}, "1": {"lang": "pt", "name": "sem tremo\u00e7o"}}}, "version": 1616013589491, "time_extracted": "2021-03-17T203949.526536Z"} tap-mongodb (out) | {"type": "RECORD",…
That confirms the issue is with the first SCHEMA message:
Copy code
tap-mongodb (out)     | {"type": "SCHEMA", "stream": "Allergie", "schema": {"type": "object"}, "key_properties": ["_id"]}
h
what’s wrong? 🙂 most of the catalog is like that
I’ll try the other plugin variants then
d
Yep, but the target expects the property identified by
key_properties
(
_id
) to actually exist inside the
schema
object, which is empty in this case 😬
So the tap is behaving in a way that's incompatible with what targets expect. This is the kind of inconsistency we're going to address with https://gitlab.com/meltano/singer-sdk 🙂
h
type object, could have ‘any’ key
object => { _id: … }
d
Right, but the target wants to know what columns it should create in the new table, and it gets confused when that's missing 😄 tap-mongodb sends more complete SCHEMA messages once it has seen some records, but the tap falls over the first empty SCHEMA message
h
ok… so discovery it’s wrong
getting the schema wrongly
I’ll test the other variant then
d
👍
h
I’m surprised that any one get this running. it’s just a simple collection :-|
d
This tap-mongodb definitely isn't documented as clearly as it should be, your issue will help fix that 🙂
h
I hope I can make it working. And by that, the document could be ok also.
I also took a look to pipelinewise today… let’s check that out then 😄
d
Their taps and targets are great, but Meltano is the best way to run them 😄
h
😉
this does not work:
Copy code
- name: tap-mongodb
    variant: pipelinewise
    pip_url: git+<https://github.com/transferwise/pipelinewise-tap-mongodb>
do I need to do meltano install --custom … instead?
d
Correct, Meltano doesn't know that variant yet
h
ok
nice, this already has auth_database
d
Great! Sounds like we may want to make that the new default variant in Meltano 🙂
h
maybe..
not started testing yet 😛
ahh… select --list now works
d
Great
h
I was thinking I was dumb 😛
nothing seamed to work
d
Sounds like the issue was with that variant of tap-mongodb, and this one's much better!
h
not sure what’s this:
Copy code
[automatic] eattasty-prd-Address._id
	[automatic] eattasty-prd-Address._sdc_deleted_at
	[automatic] eattasty-prd-Address.document
_sdc_* is from stitchdata migration
d
Interesting
The pipelinewise tap probably uses that same prefix for metadata columns
That makes it look like the entire document will be extracted in a single
document
object, which may end up being a single
jsonb
document
column once loaded with pipelinewise-target-postgres
h
I hope not 😄
do I need to test it?
Copy code
{
  "table_name": "User",
  "stream": "User",
  "metadata": [
    {
      "breadcrumb": [],
      "metadata": {
        "table-key-properties": [
          "_id"
        ],
        "database-name": "eattasty-prd",
        "row-count": 57089,
        "is-view": false,
        "valid-replication-keys": [
          "_id",
          "email"
        ]
      }
    }
  ],
  "tap_stream_id": "eattasty-prd-User",
  "schema": {
    "type": "object",
    "properties": {
      "_id": {
        "type": [
          "string",
          "null"
        ]
      },
      "document": {
        "type": [
          "object",
          "array",
          "string",
          "null"
        ]
      },
      "_sdc_deleted_at": {
        "type": [
          "string",
          "null"
        ]
      }
    }
  }
}
d
Yeah looks like that's the behavior it'd show...
h
I don’t want this
need to check other tap then?
d
Let's go back to the other tap and try the workaround I described in https://gitlab.com/meltano/meltano/-/issues/2517 that uses https://meltano.com/docs/plugins.html#schema-extra to make the schema explicit, if that's an option for you
h
hum, this means that I need to explicit put the schema by hand?
d
OK, easiest next option: try out the original tap-mongodb with https://meltano.com/plugins/loaders/postgres--meltano.html (the
meltano
variant of
target-postgres
) which doesn't care about an empty initial SCHEMA
h
or fork the tap-mongodb and fix it 🙂
d
Or that 😄
h
I’ve got a output of the catalog, and some tables are ‘ok’ some not
I’ll try a differente target… and then think about the options that I have 😐
I’ll try again later/tomorrow
many thanks
d
Glad I could be of help, we'll figure it out tomorrow!
h
@douwe_maan it seams that pipelinewise handles ‘document’ and creates multiple-columns and metadata columns. https://transferwise.github.io/pipelinewise/user_guide/metadata_columns.html Also, handles schema changes: https://transferwise.github.io/pipelinewise/user_guide/schema_changes.html
d
Ah, cool! All of that also applies when using their taps and targets with Meltano instead of their own runner
h
but, only works when extractor and loader are both pipelinwise rigjt?
d
Yep
h
extractors and loaders should be interchangable
d
Yep, they should be, and we're arguably looking at a bug in tap-mongodb here since targets aren't expected to be able to handle SCHEMA messages that don't actually list any properties
h
looking at the code from tap-mongodb and what I see is:
Copy code
return {
        'table_name': collection_name,
        'stream': collection_name,
        'metadata': metadata.to_list(mdata),
        'tap_stream_id': "{}-{}".format(collection_db_name, collection_name),
        'schema': {
            'type': 'object'
        }
    }
the “schema” property is always type object in all collections
it only looks at ‘keys’ (aka indexes)
d
Right, which breaks the expectations described in https://github.com/singer-io/getting-started/blob/master/docs/DISCOVERY_MODE.md#schemas. Which is why one solution I suggested above is to explicitly define the schema, but that's obviously not ideal. https://github.com/singer-io/tap-mongodb/pull/40 lets the tap dynamically generate
SCHEMA
messages based on sampling, and that logic should really be used in discovery mode as well, and in the very first
SCHEMA
message
h
that’s in master branch
d
Right, but it's only used for later SCHEMA message, not for discovery mode or the first SCHEMA message
h
ok, so in teory could be reused in both
d
Yep, but I haven't looked into the code to see how easy/hard that would be
h
ok, it’s in common… and the code for the SHEMA it’s in init
so, it should collect a sample of the database, ex: 1000 records, and try to figure out the schema by interating each row
I would need to take a better look, but in oplog, for example, it does call the common.row_to_schema(() in the sync_collection method
@douwe_maan
d
Is that victory I smell?
h
it’s just a test… but it sort of worked
just added the column for testng
Copy code
'schema': {
            'type': 'object',
            'properties': {
                "_id": { 'type': ['null', 'string'] }
            }
        }
d
Was that enough to fix it? 🤔
h
but only syncs that I’m afraid
d
Hmm
h
so, I need to check ‘when’ the postgres receives the schema and why it’s not updated
d
I think the target may create the table when it receives the first
SCHEMA
message, with only the
_id
field, and ignore any
SCHEMA
messages that follow, as well as any fields with other names in
RECORD
messages
h
I’m new here… it’s the first time I’m using this
d
Which is correct behavior, since each stream is only supposed to have a single
SCHEMA
message, but
tap-mongodb
is not following that rule
h
the schema is updating in the mongodb tab
d
I appreciate your patience 🙂 We'll figure it out!
h
humm… but to figure the right schema, and because this is mongodb 😛
we need to do a first pass on the rows to figure the ‘better’ shcma
d
Yep
And never send an empty SCHEMA
h
so, the only one that counts it’s the first message…
d
Correct
h
so, it’s ‘dumb’ updating per each row like it is?
d
Yeah, some targets may know how to deal with that, but targets are not required to and most don't, since the spec only describes one SCHEMA message per stream
h
message has been deleted
d
So the tap may have worked with the target it was developed alongside, but not the ones we're using
Does that immediately write the SCHEMA as well?
You should be able to see with `meltano --log-level=debug elt ...`how often SCHEMA messages come along
h
if this is the msg:
Copy code
tap-mongodb (out)     | {"type": "SCHEMA", "stream": "Allergie", "schema": {"properties": {"_id": {"type": ["null", "string"]}}, "type": "object"}, "key_properties": ["_id"]}
then… only once
d
No more schemas after that? Then the target is never learning about the real properties, even though the "watching rows to determine the schema" approach is implemented
That's weird
h
+++ INFO … it’s mine for ‘debugging’ ```tap-mongodb (out) | {"type": "STATE", "value": {"bookmarks": {"eattasty-prd-Allergie": {"last_replication_method": "LOG_BASED", "oplog_ts_time": 1616023844, "oplog_ts_inc": 1, "version": 1616023844615}}, "currently_syncing": "eattasty-prd-Allergie"}} tap-mongodb (out) | {"type": "ACTIVATE_VERSION", "stream": "Allergie", "version": 1616023844615} target-postgres (out) | {"bookmarks": {"eattasty-prd-Allergie": {"last_replication_method": "LOG_BASED", "oplog_ts_time": 1616023844, "oplog_ts_inc": 1, "version": 1616023844615}}, "currently_syncing": "eattasty-prd-Allergie"} meltano | INFO Incremental state has been updated at 2021-03-17 233044.620836. meltano | DEBUG Incremental state: {'bookmarks': {'eattasty-prd-Allergie': {'last_replication_method': 'LOG_BASED', 'oplog_ts_time': 1616023844, 'oplog_ts_inc': 1, 'version': 1616023844615}}, 'currently_syncing': 'eattasty-prd-Allergie'} target-postgres | ERROR Allergie - Table for stream does not exist tap-mongodb | INFO Querying eattasty-prd-Allergie with: tap-mongodb | Find Parameters: {'$lte': 'sulphites'} tap-mongodb | INFO ++++ tap-mongodb | INFO {'type': 'object', 'properties': {}} tap-mongodb (out) | {"type": "RECORD", "stream": "Allergie", "record": {"_id": "celery", "locales": {"0": {"lang": "en", "name": "celery free"}, "1": {"lang": "pt", "name": "sem aipo"}}}, "version": 1616023844615, "time_extracted": "2021-03-17T233044.650529Z"} tap-mongodb | INFO ++++ tap-mongodb | INFO {'type': 'object', 'properties': {'locales': {'anyOf': [{}]}}} tap-mongodb (out) | {"type": "RECORD", "stream": "Allergie", "record": {"_id": "crustaceans", "locales": {"0": {"lang": "en", "name": "crustaceans free"}, "1": {"lang": "pt", "name": "sem crust\u00e1ceos"}}}, "version": 1616023844615, "time_extracted": "2021-03-17T233044.650529Z"} tap-mongodb | INFO ++++ tap-mongodb | INFO {'type': 'object', 'properties': {'locales': {'anyOf': [{}]}}} tap-mongodb (out) | {"type": "RECORD", "stream": "Allergie", "record": {"_id": "dairy", "locales": {"0": {"lang": "en", "name": "lactose free"}, "1": {"lang": "pt", "name": "sem lactose"}}}, "version": 1616023844615, "time_extracted": "2021-03-17T233044.650529Z"} tap-mongodb (out) | {"type": "RECORD", "stream": "Allergie", "record": {"_id": "egg", "locales": {"0": {"lang": "en", "name": "egg free"}, "1": {"lang": "pt", "name": "sem ovo"}}}, "version": 1616023844615, "time_extracted": "2021-03-17T233044.650529Z"} tap-mongodb | INFO ++++ tap-mongodb | INFO {'type': 'object', 'properties': {'locales': {'anyOf': [{}]}}} tap-mongodb | INFO ++++ tap-mongodb | INFO {'type': 'object', 'properties': {'locales': {'anyOf': [{}]}}} tap-mongodb (out) | {"type": "RECORD", "stream": "Allergie", "record": {"_id": "fish", "locales": {"0": {"lang": "en", "name": "fish free"}, "1": {"lang": "pt", "name": "sem peixe"}}}, "version": 1616023844615, "time_extracted": "2021-03-17T233044.650529Z"} tap-mongodb (out) | {"type": "RECORD", "stream": "Allergie", "record": {"_id": "gluten", "locales": {"0": {"lang": "en", "name": "gluten free"}, "1": {"lang": "pt", "name": "sem glutten"}}}, "version": 1616023844615, "time_extracted": "2021-03-17T233044.650529Z"} tap-mongodb | INFO ++++ tap-mongodb | INFO {'type': 'object', 'properties': {'locales': {'anyOf': [{}]}}} tap-mongodb | INFO ++++ tap-mongodb | INFO {'type': 'object', 'properties': {'locales': {'anyOf': [{}]}}} tap-mongodb (out) | {"type": "RECORD", "stream": "Allergie", "record": {"_id": "lupine", "locales": {"0": {"lang": "en", "name": "lupine free"}, "1": {"lang": "pt", "name": "sem tremo\u00e7o"}}}, "version": 1616023844615, "time_extracted": "2021-03-17T233044.650529Z"} tap-mongodb (out) | {"type": "RECORD", "stream": "Allergie", "rec…
so, the ‘properties’ of the schema var are changing… but not sure for
d
It did find the
locales
property at some point, but not what was inside it
The target should be able to turn that into a
jsonb
column, or some targets may denest it into a separate joined table
h
this is the output of the tap-mongodb from stitchdata
it creates and flattens the array
d
Ah all right, I'm sure one of the target-postgres's has the same behavior 😅
h
right… postgres.. confusing where the names 😄
d
So if you fix the tap to no longer send an empty schema, it should work fine with the meltano target!
h
ok… I could spend some time if it’s ‘only’ that
d
There's a good chance the tap would also need to "fill in" the
'anyOf': [{}]
it found for
locales
with actual details abouts its properties. But that should "just" be a matter of running the current schema detection logic recursively
h
isn’t this a message of type SCHEMA?
Copy code
if common.row_to_schema(schema, row):
                singer.write_message(singer.SchemaMessage(
                    stream=common.calculate_destination_stream_name(stream),
                    schema=schema,
                    key_properties=['_id']))
d
it is
h
it’s per row
ok there’s a message initial, with STATE and SCHEMA
then, that one, never goes to the logs
d
It never gets printed with the
tap-mongodb (out)
prefix indicating it's actual output going to the target? So the target indeed never gets the full schema?
h
the log that I sent you, I do not see any SCHEMA in the log
d
Nor do I 😕 I wonder what's stopping that line from actually executing. Maybe additional INFO logs will help figure that out? Or drop into a pdb
h
I could put a LOGGER there
so, never enters there: if common.row_to_schema(schema, row): would be always false
so, row_to_schema it’s saying that it did no change anything
d
😒
h
it’s a bit odd because it did change:
Copy code
tap-mongodb           | INFO ++++
tap-mongodb           | INFO {'type': 'object', 'properties': {}}
tap-mongodb (out)     | {"type": "RECORD", "stream": "Allergie", "record": {"_id": "celery", "locales": {"0": {"lang": "en", "name": "celery free"}, "1": {"lang": "pt", "name": "sem aipo"}}}, "version": 1616023844615, "time_extracted": "2021-03-17T23:30:44.650529Z"}
tap-mongodb           | INFO ++++
tap-mongodb           | INFO {'type': 'object', 'properties': {'locales': {'anyOf': [{}]}}}
at least once
ok, but the ‘problem’ is the initial SCHEMA that’s empty right?
d
Correct
h
is there any documentation on how the ‘schema’ should look?
just type and format and properites?
h
ok it’s a JSON Schema…
d
That's right
h
ok, I’m going to sleep 🙂 I’ll try to check if I can do something tomorrow
d
Sleep well!
h
@douwe_maan sort of working… too many anyOf for me 😄
Copy code
{
      "table_name": "Zone",
      "stream": "Zone",
      "metadata": [
        {
          "breadcrumb": [],
          "metadata": {
            "table-key-properties": [
              "_id"
            ],
            "database-name": "eattasty-prd",
            "row-count": 199,
            "is-view": false,
            "valid-replication-keys": [
              "_id"
            ]
          }
        }
      ],
      "tap_stream_id": "eattasty-prd-Zone",
      "schema": {
        "type": "object",
        "properties": {
          "_id": {
            "type": [
              "null",
              "string"
            ]
          },
          "delivery": {
            "anyOf": [
              {}
            ]
          },
          "coordinates": {
            "anyOf": [
              {
                "type": "array",
                "items": {
                  "anyOf": [
                    {
                      "type": "object",
                      "properties": {
                        "lat": {
                          "anyOf": [
                            {
                              "type": "number"
                            },
                            {}
                          ]
                        },
                        "lng": {
                          "anyOf": [
                            {
                              "type": "number"
                            },
                            {}
                          ]
                        }
                      }
                    },
                    {}
                  ]
                }
              },
              {}
            ]
          }
        }
      }
    },
Output on the Allergie table
d
That looks like success!
h
@douwe_maan not sure if I can call this a success. ```tap-mongodb (out) | {"type": "RECORD", "stream": "Order", "record": {"_id": "5a993a3bfbcb7a00c8190de3", "orderdate": "2018-03-02T000000.000000Z", "modifieddate": "2018-03-02T122355.713000Z", "payment_status": "PAID", "status": "DELIVERED", "alerted": true, "fail": false, "fail_reason": "NONE", "customerId": "597f04ee5431d600a6b35dff", "createddate": "2018-03-02T114926.995000Z", "cutlery": false, "driverId": "5899f0ef9874d1aa7e323a03", "delivered": "2018-03-02T124457.947000Z", "deliveryEnded": "2018-03-02T124500.000000Z", "areaId": "5d13407be54b0000cf7090b6", "organizationId": "594a8aa2acad7f856c48e01e", "routeId": "5ddfa7b81cf91300e36daf9d", "delivery": "lunch"}, "version": 1616080261266, "time_extracted": "2021-03-18T151101.303316Z"} tap-mongodb (out) | {"type": "RECORD", "stream": "Order", "record": {"_id": "5a993aabbf662e00bb51b019", "orderdate": "2018-03-02T000000.000000Z", "modifieddate": "2018-03-02T122204.560000Z", "payment_status": "PAID", "status": "DELIVERED", "alerted": true, "fail": false, "fail_reason": "NONE", "customerId": "595f669e2052ba00a5fecc9c", "createddate": "2018-03-02T115114.652000Z", "cutlery": true, "driverId": "5a464e64ad17d8a721e63135", "delivered": "2018-03-02T122204.560000Z", "deliveryEnded": "2018-03-02T124500.000000Z", "areaId": "5d13407be54b0000cf7090b6", "organizationId": "58170b1d28c9082c04ff0fde", "routeId": "5d2d99691a1c8000c98ead5f", "delivery": "lunch"}, "version": 1616080261266, "time_extracted": "2021-03-18T151101.303316Z"} tap-mongodb (out) | {"type": "RECORD", "stream": "Order", "record": {"_id": "5a993aaefbcb7a00c8190de7", "orderdate": "2018-03-06T000000.000000Z", "modifieddate": "2018-03-06T123450.919000Z", "payment_status": "PAID", "status": "DELIVERED", "alerted": true, "fail": false, "fail_reason": "NONE", "customerId": "5a70a186e769e200c075879d", "promocodes": ["5a9937debf662e00bb51b012"], "discount": 5.9, "createddate": "2018-03-02T115115.711000Z", "cutlery": true, "driverId": "59f70fd1acad7f856ccdfaed", "delivered": "2018-03-06T123450.919000Z", "deliveryEnded": "2018-03-06T131500.000000Z", "areaId": "5d13407be54b0000cf7090b6", "organizationId": "576baf2524748e8060000000", "routeId": "5cd461f6799fa00152397fcf", "delivery": "lunch"}, "version": 1616080261266, "time_extracted": "2021-03-18T151101.303316Z"} tap-mongodb (out) | {"type": "RECORD", "stream": "Order", "record": {"_id": "5a993ac0fbcb7a00c8190de9", "orderdate": "2018-03-07T000000.000000Z", "modifieddate": "2018-03-07T124504.929000Z", "payment_status": "PAID", "status": "DELIVERED", "alerted": true, "fail": false, "fail_reason": "NONE", "customerId": "5a70a186e769e200c075879d", "promocodes": ["5a9937debf662e00bb51b012"], "discount": 5.9, "createddate": "2018-03-02T115136.369000Z", "cutlery": true, "driverId": "59f70fd1acad7f856ccdfaed", "delivered": "2018-03-07T124504.928000Z", "deliveryEnded": "2018-03-07T131500.000000Z", "areaId": "5d13407be54b0000cf7090b6", "organizationId": "576baf2524748e8060000000", "routeId": "5cd461f6799fa00152397fcf", "delivery": "lunch"}, "version": 1616080261266, "time_extracted": "2021-03-18T151101.303316Z"} meltano | DEBUG Deleted configuration at /Users/kimus/Develop/eattasty/meltano/mongo2pg/mongo2pg/.meltano/run/elt/2021-03-18T151057--tap-mongodb--target-postgres/02a4934f-5255-4762-9475-fb1a22bfb4e0/target.config.json meltano | DEBUG Deleted configuration at /Users/kimus/Develop/eattasty/meltano/mongo2pg/mongo2pg/.meltano/run/elt/2021-03-18T151057--tap-mongodb--target-postgres/02a4934f-5255-4762-9475-fb1a22bfb4e0/tap.config.json meltano | ERROR Loading failed (1): target_postgres.exceptions.SingerStreamError: ('Invalid records detected above threshold: 0. See
.args
for details.', [(<ValidationError: 'False is not valid under any of the given schemas'>, {'type': 'RECORD', 'stream': 'Order', 'record': {'_id': '5a9842308257…
d
All right:
Copy code
target_postgres.exceptions.SingerStreamError: ('Invalid records detected above threshold: 0. See `.args` for details.', [(<ValidationError: 'False is not valid under any of the given schemas'>, {'type': 'RECORD', 'stream': 'Order', 'record': {'_id': '5a9842308257a200c3fa5e8b', 'orderdate': '2018-03-02T00:00:00.000000Z', 'modifieddate': '2018-03-02T12:49:25.978000Z', 'payment_status': 'PAID', 'status': 'DELIVERED', 'alerted': True, 'fail': False, 'fail_reason': 'NONE', 'customerId': '5a7840418d350100c22e6445', 'promocodes': ['5a9696bfc270f700c13b9d95'], 'discount': Decimal('5.9'), 'createddate': '2018-03-01T18:11:05.739000Z', 'cutlery': True, 'driverId': '59f70fd1acad7f856ccdfaed', 'delivered': '2018-03-02T12:49:25.978000Z', 'deliveryEnded': '2018-03-02T12:45:00.000000Z', 'areaId': '5d13407be54b0000cf7090b6', 'organizationId': '58f79b4e325a7145ba47e6ce', 'routeId': '58a2fe719874d1aa7e482e95', 'delivery': 'lunch'}, 'version': 1616080261266, 'time_extracted': '2021-03-18T15:11:01.303316Z', '__raw_line_size': 797})])
h
no, thats fine… needs to be valid records
fixed… running again
I see plenty of ‘record’ messages, they go to where? no queue or databbase?
success!
Copy code
target-postgres       | INFO Writing table batch with 109874 rows for `('Order__1616080549258',)`...
d
Meltano passes the RECORD message straight from the tap to the target. Targets typically do their own batching
still need to check this errors to get this done:
Copy code
target_postgres.exceptions.SingerStreamError: ('Invalid records detected above threshold: 0. See `.args` for details.', [(<ValidationError: "{'reason': 'WRONG_DAY', 'observations': '', 'promoId': '5d66ee3b0807d200cc647cc5', 'promoCodeValue': '5,90'} is not valid under any of the given schemas">,
so, increasing the sample of records to 1000 will resolve this. it’s is better then yesterday … but…
and I’m guessing target-postgres does not handles schema changes
d
It does
h
I don’t see the ‘new’ columns in the table… but ok, good
@douwe_maan I’m migrating all the databases, and so far so good. Millions of records are getting to postgres from mongodb! Thanks a lot for your help. Just some questions need to be clarified like this but so far I’m much more confident now 😄 I would like to get the ‘select --list’ done also, wasn’t working. I’ll get in to it in my fork. Thanks again!
a
Wow, this is great!
I spent a few hours trying to figure out
tap-mongodb
, wish i had read this first!
Another fork you might be interested in is https://github.com/Tolsto/tap-mongodb , which adds the option to specify a db url instead of individual config components:
Copy code
{
  "database_url": "<mongodb+srv://user:myRealPassword@cluster0.mongodb.net/test?w=majority&tls=true>"
}
It also installs
dnspython
, which looks like a necessary dependency for certain mongodb hosts (like Atlas)