Hi there, I am currently switching my taps from <s...
# singer-tap-development
m
Hi there, I am currently switching my taps from singer.io to meltano framework. And today I stumbled across the first Tap i may not be able to switch, but I would be glad to get your input or maybe this should be a request for the meltano framework. I sync data from an API which is not really well written. Therefore I need to sync all data and can not just sync changes. To be specific, I sync contracts and ledger entires. Ledger entries depend on contracts (parent child relation). Now the issue: as there is a huge amount of data, I set up multi-threading for the child context. This means, for each parent result, I start X-Threads to sync the children. Without it, the job runs days, otherwise juste hours. I found no way to do multi-threading simply using the meltano sdk. Did I miss somwthing? Could this maybe be added as a feature request? I am astonished, that no one else had this "issue" before. Glad for every input and have a nice weekend 😉
e
I think this is the closest FR we have to what you need: https://github.com/meltano/sdk/issues/183. There's surely a good amount of refactoring needed for this and I personally don't have a reference implementation to work with. I'd love for the SDK to support this, though!
m
Thanks @edgar_ramirez_mondragon. Is there a way to press this request? I think this could really leverage the use as this is a limitation for many endpoints.
a
@michel_ebner have you investigated this any further? We are currently facing a very similar issue. We have the following stream structure: Parent stream > Child stream > Grandchild stream Parent stream gets 20 entities per API call, but child and grand child streams only get 1 result per time. So even though we get 20 entities really fast, to get the next 20 we have to wait for an extra 40 serial API calls.
m
@andrio_frizon Quick answer: I did not find an alternative solution Long answer: I looked into different possibilites. Rewriting some methods from the Meltano SDK proved to be to messy and did not work as expected. There is just to much to rewrite to make it work I ended up staying on pure Singer SDK where I store the data locally for marked streams and run some streams according to another mark with multi threading. I think this issue becomes more important as many APIs are structued that way. Maybe with many requests, @edgar_ramirez_mondragon and the team may prioritise this 😛
e
Is there a way to press this request?
This is a significant refactor, so it's gonna be hard for us to prioritize but I'll be happy to review PRs (plural, since we should do this in stages 😅) Do give the issue a 👍 in the meantime!