Hi team, just want to learn more about how we coul...
# best-practices
z
Hi team, just want to learn more about how we could horizontally scale the ELT job. It seems that we could run ELT jobs in different containers. How could we orchestrate different ELT pipelines (extractors) to fetch the data from the same source?
a
Do you have a specific source in mind for example?
One approach is to split up streams so that multiple instances of the extractors use the 'inherit_from' feature and are configured, each using a different set of 'select' rules to pull a different set of streams from the same source.
z
Thanks for the suggestions! We will try that. We are trying to process our historical data on Slack and Salesforce (also S3). It may take days if we only use one single ELT pipeline. So, we are trying to scale the pipeline.
a
That's helpful context - thanks. Generally, per source, you'll have 3-8 tables that are 90% of the data volume, so putting those into a separate pipeline often is helpful, especially for an initial backfill operation. So, if you can get your 50+ smaller tables running smoothly and all caught up, then the 3-8 larger ones can be caught up in groups of two or three, or even just one-table-per-pipeline in some cases if the table is large and/or the cadence you want it refreshed is more frequent than the other tables.