Is there a straightforward way to do a schema only `meltano Meltano #troubleshooting

Is there a straightforward way to do a schema-only...

tj_murphy

04/08/2022, 2:27 AM

Is there a straightforward way to do a schema-only

meltano elt

run?

aaronsteers

04/08/2022, 6:05 AM

Not easily, no. There are options, but none are universal or very streamlined as of yet. SDK-based taps may have

--test

(1 row per stream) or

--test=schema

(schema only, no records). And per case, you can also use stream maps with a

__filter__

of something like

1 == 0

to suppress all rows. Are there specific taps you'd like to do this for?

aaronsteers

04/08/2022, 6:12 AM

Feel free to add comments here if a mapping extension would be helpful: Exclude all rows or limit `n` rows per stream · Issue #13 · MeltanoLabs/meltano-map-transform

visch

04/11/2022, 12:01 PM

To me this seems like you're not doing elt so it makes sense that you'd have to call

meltano invoke tap-name --discover

for schema only stuff Another way I've done this is added "information schema" like tables to my source and then I select them. For Oracle I selected the

sys.*

tables and then you can think about them as

elt

🤷 Having

sdc_catalog

as its own stream has also seemed interesting to me, but you'd have to either force all taps to do it or do it yourself in the orchestrator layer, this one ties into the

metadata

issues Meltano has

tj_murphy

04/29/2022, 5:25 PM

My use cases are: 1. When trying out Meltano for the first time for a new source, I want to ensure it works for all the data types I use in that source, and that it maps those data types to something reasonable in my destination. 2. For my workloads – Postgres tables with 10B+ rows representing 5+ TBs of data – Meltano does not replicate in a reasonable amount of time. As a workaround, I want Meltano to create the table, I manually backfill in a performant way, and then I let Meltano take over for ongoing replication.

aaronsteers

04/29/2022, 11:30 PM

@tj_murphy - I think there's a viable path forward, which we likely could handle in a contributed MR if you are interested in taking this on. The path forward would likely be something like:

Add a
--schema-only
option (or similar) to
meltano run
and/or
meltano elt
.

When

--schema-only

is invoked, then several behaviors are triggered: 1. state is ignored 2. start_date for the tap is set to tomorrow 3. meltano drops on the floor any

RECORD

messages it sees passing from the tap to the target. An alternative (perhaps better?) implementation would be: 1. Run discovery if needed. 2. Instead of invoking the tap, meltano iterates through the catalog and sends it's own

SCHEMA

records to the target, one corresponding to the schema of each stream in the catalog.

aaronsteers

04/29/2022, 11:31 PM

@tj_murphy - What do you think of these options? Would you mind logging an issue on this (if one doesn't already exist) so we can discuss in further detail?

Open in Slack

Previous Next