I m trying to use a tap which doesn t actually have insight Meltano #announcements

I'm trying to use a tap which doesn't actually hav...

julian_knight

08/05/2020, 6:07 PM

I'm trying to use a tap which doesn't actually have insight into the schema of the source data, so it emits a generic

"schema": {"type": "object"}

. I thought that by specifying the schema extractor extra in meltano.yml I could provide the actual schema of the data in question. However, even after specifying it the loader crashes when it sees an empty schema message.

meltano select --list --all

returns the table but doesn't list any columns. If my understanding is correct, the metadata+schema are simply passed back to the tap via catalog, so if the tap isn't using them when building the schema messages, that means the tap is at fault, right?

julian_knight

08/05/2020, 6:20 PM

Actually I'm looking at the

tap.properties.json

file generated and I'm not seeing my schema config in there

julian_knight

08/05/2020, 6:31 PM

It does appear to maybe be a meltano issue, when I put the correct schema in

tap.properties.json

the tap emitted the correct schema message.

julian_knight

08/05/2020, 6:53 PM

I also confirmed it wasn't a typo in

meltano.yml

- I re-did the schema definition with

meltano config <plugin> set _schema

but it still didn't change

tap.properties.json

douwe_maan

08/05/2020, 7:34 PM

Meltano currently only overrides schema for streams/properties that already exist in the discovered catalog, it doesn't add properties to the schema that weren't actually discovered

douwe_maan

08/05/2020, 7:36 PM

If the tap isn't discovering a useful catalog, that's one thing, but if it's actually sending empty schema messages to the loader, I'm not sure how it expects the loader to construct the table (or its equivalent)

douwe_maan

08/05/2020, 7:37 PM

Even if Meltano would allow you to specify schema info for properties that weren't actually discovered (which I wouldn't be opposed to), that info is actually only fed to the tap, not the target. The tap is responsible for feeding a schema to the target.

douwe_maan

08/05/2020, 7:38 PM

Or does the tap explicitly document that you're supposed to manually define your schema in the catalog, which it will then forward to the target?

douwe_maan

08/05/2020, 7:39 PM

If so, I'd suggest filing a feature proposal on our issue tracker to not just overwrite, but also append schema properties to the catalog's stream definition, and to have that work for selection and metadata rules too 🙂 All three features current assume that the discovered catalog defines all streams and their properties

julian_knight

08/05/2020, 8:19 PM

the tap is dynamodb which has no documentation at all! 🙃 but based on some experimentation it seems it expects you to set the schema in the catalog definition. Putting it in the catalog doesn't change the tap's discover behavior but it does change it's run behavior. I'm not familiar enough with the spec to know if this is in line with the singer spec but given this is an official tap I think that maybe appending schema properties to the catalog makes sense. I'll open the issue

julian_knight

08/05/2020, 9:05 PM

https://gitlab.com/meltano/meltano/-/issues/2216 🙂

douwe_maan

08/05/2020, 9:09 PM

Thanks! The spec allows for taps that don't implement a (useful) discovery mode but still take a catalog, so implementing deep-merge behavior makes sense to me!

Open in Slack

Previous Next