I'm trying to use a tap which doesn't actually hav...
# announcements
j
I'm trying to use a tap which doesn't actually have insight into the schema of the source data, so it emits a generic
"schema": {"type": "object"}
. I thought that by specifying the schema extractor extra in meltano.yml I could provide the actual schema of the data in question. However, even after specifying it the loader crashes when it sees an empty schema message.
meltano select --list --all
returns the table but doesn't list any columns. If my understanding is correct, the metadata+schema are simply passed back to the tap via catalog, so if the tap isn't using them when building the schema messages, that means the tap is at fault, right?
Actually I'm looking at the
tap.properties.json
file generated and I'm not seeing my schema config in there
It does appear to maybe be a meltano issue, when I put the correct schema in
tap.properties.json
the tap emitted the correct schema message.
I also confirmed it wasn't a typo in
meltano.yml
- I re-did the schema definition with
meltano config <plugin> set _schema
but it still didn't change
tap.properties.json
d
Meltano currently only overrides schema for streams/properties that already exist in the discovered catalog, it doesn't add properties to the schema that weren't actually discovered
If the tap isn't discovering a useful catalog, that's one thing, but if it's actually sending empty schema messages to the loader, I'm not sure how it expects the loader to construct the table (or its equivalent)
Even if Meltano would allow you to specify schema info for properties that weren't actually discovered (which I wouldn't be opposed to), that info is actually only fed to the tap, not the target. The tap is responsible for feeding a schema to the target.
Or does the tap explicitly document that you're supposed to manually define your schema in the catalog, which it will then forward to the target?
If so, I'd suggest filing a feature proposal on our issue tracker to not just overwrite, but also append schema properties to the catalog's stream definition, and to have that work for selection and metadata rules too 🙂 All three features current assume that the discovered catalog defines all streams and their properties
j
the tap is dynamodb which has no documentation at all! 🙃 but based on some experimentation it seems it expects you to set the schema in the catalog definition. Putting it in the catalog doesn't change the tap's discover behavior but it does change it's run behavior. I'm not familiar enough with the spec to know if this is in line with the singer spec but given this is an official tap I think that maybe appending schema properties to the catalog makes sense. I'll open the issue
d
Thanks! The spec allows for taps that don't implement a (useful) discovery mode but still take a catalog, so implementing deep-merge behavior makes sense to me!