Hi team just want to learn more about how we could horizonta Meltano #best-practices

Hi team, just want to learn more about how we coul...

zibo_gong

02/28/2022, 6:27 PM

Hi team, just want to learn more about how we could horizontally scale the ELT job. It seems that we could run ELT jobs in different containers. How could we orchestrate different ELT pipelines (extractors) to fetch the data from the same source?

aaronsteers

02/28/2022, 6:31 PM

Do you have a specific source in mind for example?

aaronsteers

02/28/2022, 6:32 PM

One approach is to split up streams so that multiple instances of the extractors use the 'inherit_from' feature and are configured, each using a different set of 'select' rules to pull a different set of streams from the same source.

zibo_gong

02/28/2022, 7:05 PM

Thanks for the suggestions! We will try that. We are trying to process our historical data on Slack and Salesforce (also S3). It may take days if we only use one single ELT pipeline. So, we are trying to scale the pipeline.

aaronsteers

02/28/2022, 7:26 PM

That's helpful context - thanks. Generally, per source, you'll have 3-8 tables that are 90% of the data volume, so putting those into a separate pipeline often is helpful, especially for an initial backfill operation. So, if you can get your 50+ smaller tables running smoothly and all caught up, then the 3-8 larger ones can be caught up in groups of two or three, or even just one-table-per-pipeline in some cases if the table is large and/or the cadence you want it refreshed is more frequent than the other tables.

4 Views

Open in Slack

Previous Next