So I’ve been messing around with parent-child stre...
# singer-tap-development
m
So I’ve been messing around with parent-child streams, and I’d like to mention couple things I’ve found. •
get_child_context()
gets called after
post_process()
. Would it not make more sense to call
get_child_context()
on the unprocessed record? For example, I had to output a json string as the value of a column in the final record, but I needed a value from within the json string for
get_child_context()
.
get_child_context()
being called on the post-processed record caused me to have to
json.loads(record)
, which seems like an unnecessary step. • It would be nice to be able to access the
context
passed to a child stream from any section of the stream. For example, my parent stream might pass a list of ids to query in the child, but the child endpoint only takes a single id per query. It would be helpful to be able to access the
context
from
get_next_page_token()
and pop the id off as needed instead of writing indexing logic within
get_next_page_token()
. I’m sure there are other areas that
context
would be helpful to access that aren’t currently accessible, but this is just the most immediate example I have. • It could also be helpful to have some sort of mechanism to build up a
context
in a parent stream to pass to a child. Since
get_child_context()
is called on each record, you may be able to only pull on value from that record, but it’s possible that the child stream can take in a list of values per request. So if there are N records produced from the parent stream, the current implementation means you have to make N calls to the API via the child stream. Building a context to pass to the child would mean you could make just one call. Maybe there is a way to access the
context
from anywhere within a stream, but I’m just unaware of it.