laurent
11/17/2021, 6:27 PMtap-github | target-postgres
and we currently have to run each stream, fetch data from github in order for the target to generate the corresponding tables in the db. Doing this in CI in particular is problematic (time and rate limiting issues). I'm thinking that the tap knows what the data structure is, since it's in the code, so there must be a way for it to tell the target? Something like the discovery mode, but outputing messages instead of a catalog. Did I miss something obvious?aaronsteers
11/17/2021, 6:50 PM"__filter__": "1 == 0"
?aaronsteers
11/17/2021, 6:51 PM--test
sends any messages at all right now, but in theory we could make a way to "turn on" it's STDOUT behaviors, which would let it send 0 or 1 records per stream.laurent
11/17/2021, 7:42 PMlaurent
11/17/2021, 7:44 PMlaurent
11/17/2021, 7:44 PMlaurent
11/17/2021, 11:28 PMtap-github --test
produced 3065 messages even with a start_date set to this morning, and took 40seconds to run.
I did not try the stream map, but I understand it would result in roughly the same behaviour, without actually sending the messages.
I did a quick proof of concept for a --schema
option, which I just made write to a schema message for each stream. It ran in 1 sec and produced 10 messages and no connections.
Sadly target-postgres did not produce the tables without data 😅 so I'll keep doing what we currently do, but if you're interested, I can send an MR for the above changes in the sdk, and see if I can do the equivalent change on target-postgres (which is not sdk based, unfortunately).aaronsteers
11/17/2021, 11:47 PM--test
actually send messages? That's great. I didn't realize we were still emitting them (versus parsing them silently). @visch has logged a bug fix MR to solve for the excessive number of records being parsed on child streams, so I think you can expect that resolved once this merges.aaronsteers
11/17/2021, 11:49 PMSadly target-postgres did not produce the tables without data.Interesting... I wonder if
--test
should just emit one record each per stream? Currently, I think the behavior is max records = 1 for parents and 0 for non-parents. (Originally was 0
but we needed at least 1 parent record in order to iterate over the child stream types.)laurent
11/18/2021, 12:01 AMtarget-postgres
to see if there's a way to shortcut all these api calls.laurent
12/07/2021, 12:44 PMtarget-postgres
actually has an option which I had previously missed which allows this to work persist_empty_tables