I’m getting the following error from a `tap-mongod...
# plugins-general
a
I’m getting the following error from a
tap-mongodb
to
target-postgres
pipe:
Copy code
target_postgres.exceptions.SingerStreamError: ('Invalid records detected above threshold: 0. See `.args` for details.'
I believe this my be an instance of this issue: https://github.com/datamill-co/target-postgres/issues/114 - where target-postgres doesn’t support
anyOf
types? Curious for any approaches to follow here. I imagine that has something to do with trying to extract varying nested structures from mongo into postgres, and therefore more explicit
meltano select
logic may be in order?
j
I’m not an expert on target-postgres but for what it’s worth I saw somewhere that postgres is very strict with the type of the columns so you have to be quite precise (or large) with the schema that you are using in your source data
a
Yeah. I’m not actually necessarily tied to pg here, so maybe I can just try a different target.
j
Well, it’s kind of a best-practice to any relational database that you you define the type of the columns correctly in advance. That’s the thing with RD, it’s their power and weakness.
a
Oh weird, I switched to
target-jsonl
output and narrowed my
tap-mongodb
select to just a few fields… and still the same exact error; and looking through the logs while playing around, it looks like it always keeps selecting everything no matter what I do. This seems to correspond with the issue i linked to above. Hrmmmmm.
d
@andrew_stewart You may have more luck with another variant of target-postgres: https://meltano.com/plugins/loaders/postgres.html#alternative-variants
a
I’ll def try those out. Though I’m getting the same issue with
target-jsonl
. I imagine bigquery / snowflake type targets might naturally handle nested structured fields in some graceful way (maybe?)
So here’s a summary of the error that occurs with both targets so far:
Copy code
target-jsonl       | jsonschema.exceptions.ValidationError: [{ ... }, { ... }, ... { ... }, }] is not valid under any of the given schemas

target-postgres       | target_postgres.exceptions.SingerStreamError: ('Invalid records detected above threshold: 0. See `.args` for details.', [(<ValidationError: '[{ ... }, { ... }, ... { ... }, }]'
d
Interesting, if it's saying the record isn't valid per the schema, that sounds like a "real" issue, not just anyOf not being supported. Are the records in question described accurately by the schema?
a
Im still learning meltano/singer so I may need a little direction on how to verify that. Are you referring to schema selection as per meltano.yml or as defined in the Singer side?
d
Ah, no, I'm referring to the SCHEMA message that the target is receiving from the tap, that it's using to validate each RECORD message. The error you're seeing indicates that the RECORD is not valid per the SCHEMA. If you run in debug mode, with
meltano --log-level=debug elt
, you should see the specific SCHEMA and RECORD messages printed that result in the error. That should help us verify where the discrepancy originates
a
Gotcha. Yeah, I’ve been running with debug logging, and it’s kind of a mess to parse through bc I’m not terribly familiar with the contents of the source db and knowing mongo there are probably nested structural changes in the list of records over time. What’s weird is that I set
meltano select
to project into just a few very specific fields, and yet the tap seems to just pull everything anyway. So there might be an issue there.
Something else that looks off is that (at least in some cases) the offending record in the validation error looks like the stringified json.
ie:
Copy code
target_postgres.exceptions.SingerStreamError: ('Invalid records detected above threshold: 0. See `.args` for details.', [(<ValidationError: '[{\'foo\': \'bar\', \'key\': \'value\', ...
d
@andrew_stewart That almost looks like the record is shaped like an array rather than a dict, which would obviously not be valid. Do you see the corresponding
tap-mongodb (out)
like for that
RECORD
message? It should have been written just before that error (and its stacktrace)
a
yeah, the preceding RECORD in the log doesn’t even resemble the record in the logged validation error.
d
Ah, depending on how the target processes the records, there may be some buffering, so the faulty record may have come earlier
a
Yeah im scrolling and searching on unique id’s from that error record but not finding it anywhere else in the log. I might need to dump the log into something besides just stdout, which is a bit unwieldly for the amount of spam being produced by the mongodb record contents :D
d
Right 😄 You can find the full elt logs inside
.meltano/logs
a
Oh! Nice! I was starting to just pipe them to another filez
So I got this working with some simpler collection streams.
which is good to verify end-to-end (w/ both jsonl and postgres targets). So I think I can narrow down the original issue to nested complex structures (arrays etc)
is it possible to tell
target-mongodb
to treat certain fields as json/strings ?
d
Possibly, if you override the schema coming from he tap: https://meltano.com/docs/plugins.html#schema-extra
a
ok right, I recall doing this with one of the csv file taps once.
Woo! @douwe_maan that worked!
(coercing the array types to strings)
And just one final update: there is actually still one stream/collection in the mongodb source where the schema type coercion isn’t working and where the record in the validation error doesn’t appear in the prior records portion of the log. I could create a redacted version of this log file if it would be interesting to review.