Is there a way to run a tap so that it produces SCHEMA recor Meltano #singer-tap-development

Is there a way to run a tap so that it produces SC...

laurent

11/17/2021, 6:27 PM

Is there a way to run a tap so that it produces SCHEMA records with no actual data? We use

tap-github | target-postgres

and we currently have to run each stream, fetch data from github in order for the target to generate the corresponding tables in the db. Doing this in CI in particular is problematic (time and rate limiting issues). I'm thinking that the tap knows what the data structure is, since it's in the code, so there must be a way for it to tell the target? Something like the discovery mode, but outputing messages instead of a catalog. Did I miss something obvious?

aaronsteers

11/17/2021, 6:50 PM

Do you mean with the SDK-based tap-github? If so, what about stream maps with a

"__filter__": "1 == 0"

aaronsteers

11/17/2021, 6:51 PM

I don't think

--test

sends any messages at all right now, but in theory we could make a way to "turn on" it's STDOUT behaviors, which would let it send 0 or 1 records per stream.

laurent

11/17/2021, 7:42 PM

yes, I mean the sdk-based tap 🙂

laurent

11/17/2021, 7:44 PM

I'll look at these 2 options, I'm not familiar with these corners of the code, but I'll report back, hopefully with a PR against tap-github 🤞

laurent

11/17/2021, 7:44 PM

thanks for the pointers!

laurent

11/17/2021, 11:28 PM

just did a quick check on the above. running

tap-github --test

produced 3065 messages even with a start_date set to this morning, and took 40seconds to run. I did not try the stream map, but I understand it would result in roughly the same behaviour, without actually sending the messages. I did a quick proof of concept for a

--schema

option, which I just made write to a schema message for each stream. It ran in 1 sec and produced 10 messages and no connections. Sadly target-postgres did not produce the tables without data 😅 so I'll keep doing what we currently do, but if you're interested, I can send an MR for the above changes in the sdk, and see if I can do the equivalent change on target-postgres (which is not sdk based, unfortunately).

aaronsteers

11/17/2021, 11:47 PM

Hi, @laurent - Did

--test

actually send messages? That's great. I didn't realize we were still emitting them (versus parsing them silently). @visch has logged a bug fix MR to solve for the excessive number of records being parsed on child streams, so I think you can expect that resolved once this merges.

aaronsteers

11/17/2021, 11:49 PM

Sadly target-postgres did not produce the tables without data.

Interesting... I wonder if

--test

should just emit one record each per stream? Currently, I think the behavior is max records = 1 for parents and 0 for non-parents. (Originally was

but we needed at least 1 parent record in order to iterate over the child stream types.)

laurent

11/18/2021, 12:01 AM

I'll keep an eye on that PR, thanks for the link! And I'll look at

target-postgres

to see if there's a way to shortcut all these api calls.

laurent

12/07/2021, 12:44 PM

Looping back to this, I opened https://gitlab.com/meltano/sdk/-/merge_requests/218 to propose a solution for it.

target-postgres

actually has an option which I had previously missed which allows this to work

persist_empty_tables

Open in Slack

Previous Next