Does meltano SDK support multithreading/parallel e...
# best-practices
a
Does meltano SDK support multithreading/parallel execution? I'm building a tap to pull data from a third party http service. All requests could be perfectly independently parallelized. I wonder how to tell SDK to make requests concurrently not sequentially.
k
Hey Artem 👋 The best way to support this today is to split your integration into separate pipelines using selectors and plugin inheritance. This effectively runs multiple instances (e.g. one per stream) of an otherwise single-threaded tap 🙂 There is some discussion on strategies for parallelism in this issue. Would be great to get your comments on it!
a
@ken_payne is correct in regards to the orchestration-layer parallelization options. If you'd like to parallelize streams at the tap layer, this would be a new feature in the SDK. It's certainly something that has come up before and it's also a difficult thing to parameterize generically. There are probably two layers of parallelization in the tap layer: running multiple streams in parallel, and asynchronously syncing child streams. For paginated REST streams, perhaps a third option is to immediately and asynchronously kick off the next page's request while still processing the returned result of the current page. @artem_vysotsky - Would you be able to log an issue for how you are thinking about this and what approach(es) you'd be interested in seeing? https://gitlab.com/meltano/sdk/-/issues/new