Is there a straightforward way to do a schema-only...
# troubleshooting
t
Is there a straightforward way to do a schema-only
meltano elt
run?
a
Not easily, no. There are options, but none are universal or very streamlined as of yet. SDK-based taps may have
--test
(1 row per stream) or
--test=schema
(schema only, no records). And per case, you can also use stream maps with a
__filter__
of something like
1 == 0
to suppress all rows. Are there specific taps you'd like to do this for?
Feel free to add comments here if a mapping extension would be helpful: Exclude all rows or limit `n` rows per stream · Issue #13 · MeltanoLabs/meltano-map-transform
v
To me this seems like you're not doing elt so it makes sense that you'd have to call
meltano invoke tap-name --discover
for schema only stuff Another way I've done this is added "information schema" like tables to my source and then I select them. For Oracle I selected the
sys.*
tables and then you can think about them as
elt
🤷 Having
sdc_catalog
as its own stream has also seemed interesting to me, but you'd have to either force all taps to do it or do it yourself in the orchestrator layer, this one ties into the
metadata
issues Meltano has
t
My use cases are: 1. When trying out Meltano for the first time for a new source, I want to ensure it works for all the data types I use in that source, and that it maps those data types to something reasonable in my destination. 2. For my workloads – Postgres tables with 10B+ rows representing 5+ TBs of data – Meltano does not replicate in a reasonable amount of time. As a workaround, I want Meltano to create the table, I manually backfill in a performant way, and then I let Meltano take over for ongoing replication.
a
@tj_murphy - I think there's a viable path forward, which we likely could handle in a contributed MR if you are interested in taking this on. The path forward would likely be something like:
Add a
--schema-only
option (or similar) to
meltano run
and/or
meltano elt
.
When
--schema-only
is invoked, then several behaviors are triggered: 1. state is ignored 2. start_date for the tap is set to tomorrow 3. meltano drops on the floor any
RECORD
messages it sees passing from the tap to the target. An alternative (perhaps better?) implementation would be: 1. Run discovery if needed. 2. Instead of invoking the tap, meltano iterates through the catalog and sends it's own
SCHEMA
records to the target, one corresponding to the schema of each stream in the catalog.
@tj_murphy - What do you think of these options? Would you mind logging an issue on this (if one doesn't already exist) so we can discuss in further detail?