Is there a way to run a tap so that it produces SC...
# singer-tap-development
l
Is there a way to run a tap so that it produces SCHEMA records with no actual data? We use
tap-github | target-postgres
and we currently have to run each stream, fetch data from github in order for the target to generate the corresponding tables in the db. Doing this in CI in particular is problematic (time and rate limiting issues). I'm thinking that the tap knows what the data structure is, since it's in the code, so there must be a way for it to tell the target? Something like the discovery mode, but outputing messages instead of a catalog. Did I miss something obvious?
a
Do you mean with the SDK-based tap-github? If so, what about stream maps with a
"__filter__": "1 == 0"
?
I don't think
--test
sends any messages at all right now, but in theory we could make a way to "turn on" it's STDOUT behaviors, which would let it send 0 or 1 records per stream.
l
yes, I mean the sdk-based tap 🙂
I'll look at these 2 options, I'm not familiar with these corners of the code, but I'll report back, hopefully with a PR against tap-github 🤞
thanks for the pointers!
just did a quick check on the above. running
tap-github --test
produced 3065 messages even with a start_date set to this morning, and took 40seconds to run. I did not try the stream map, but I understand it would result in roughly the same behaviour, without actually sending the messages. I did a quick proof of concept for a
--schema
option, which I just made write to a schema message for each stream. It ran in 1 sec and produced 10 messages and no connections. Sadly target-postgres did not produce the tables without data 😅 so I'll keep doing what we currently do, but if you're interested, I can send an MR for the above changes in the sdk, and see if I can do the equivalent change on target-postgres (which is not sdk based, unfortunately).
a
Hi, @laurent - Did
--test
actually send messages? That's great. I didn't realize we were still emitting them (versus parsing them silently). @visch has logged a bug fix MR to solve for the excessive number of records being parsed on child streams, so I think you can expect that resolved once this merges.
Sadly target-postgres did not produce the tables without data.
Interesting... I wonder if
--test
should just emit one record each per stream? Currently, I think the behavior is max records = 1 for parents and 0 for non-parents. (Originally was
0
but we needed at least 1 parent record in order to iterate over the child stream types.)
l
I'll keep an eye on that PR, thanks for the link! And I'll look at
target-postgres
to see if there's a way to shortcut all these api calls.
Looping back to this, I opened https://gitlab.com/meltano/sdk/-/merge_requests/218 to propose a solution for it.
target-postgres
actually has an option which I had previously missed which allows this to work
persist_empty_tables