Jens Christian Hillerup
11/27/2024, 2:23 PMGET /documents
that gives me a list of documents, and then another one like GET /document/<doc_id>
that returns important information that's not part of the former. I've wrapped REST APIs before with the tap cookiecutter in the Meltano SDK, but how would this actually translate into a RESTStream
in the Singer tap? Is it even a supported use case to have an "N+1" stream which requires another roundtrip per record? (I'm aware it's going to be slow, luckily it's not that many rows). Thanks!Andy Carter
11/27/2024, 3:43 PMDocumentSummary
which hits the 'get documents' endpoint and then a DocumentDetail
child stream to each, assuming a 1:1 relationship. Then you need to rejoin them downstream in DBT or similar.Reuben (Matatika)
11/27/2024, 5:19 PMtap-spotify
- get tracks and then make a (single) separate request to grab the audio features for each and merge into the track record. Sort of violates the principle of ELT as this is a technically a transformation, but it is possible.
https://github.com/Matatika/tap-spotify/blob/f944d7430f9003ef589acec21f67b16c14a09095/tap_spotify/streams.py#L41-L94Jens Christian Hillerup
12/04/2024, 9:16 AMaiohttp
to extract more of these documents simultaneously? Suppose I could also do a multiprocessing
+ requests
thing.Reuben (Matatika)
12/04/2024, 9:42 AMGET /document/{id}
), I would go with that approach using child streams.
There's been a fair amount of talk around Meltano running streams in parallel, so if you did declare document detail as a child stream you may be able to benefit from that feature in the future. I think this is the primary issue: https://github.com/meltano/meltano/issues/2677