What’s the best practice for ingesting multiple ta...
# best-practices
o
What’s the best practice for ingesting multiple taps in parallel to separate datasets, and then merging them into a final
stg
dataset after completion? Use case: Ingesting data from multiple Shopify stores. Right now, we run one Meltano pipeline per store, which: • Extracts raw data into a shared
raw_shopify
dataset in BigQuery • Creates common views in a single
stg_shopify
dataset This setup causes some issues. Ideally, we want to: 1. Ingest each store's raw data into its own dataset (e.g.
raw_shopify_store1
,
raw_shopify_store2
, etc.) in parallel 2. Run per-store transforms into separate staging datasets (e.g.
stg_shopify_store1
, etc.) 3. Run a final transform step that unions everything into a central
stg_shopify
dataset Is there a clean way to do this in Meltano? Any recommendations or patterns others are using?
Guess one option could be a pipeline with multiple extractors and a final transform step?
e
Hi @Oscar Gullberg 👋 One setup that might work for you is using inheritance along with the extractor's
load_schema
setting: • https://docs.meltano.com/concepts/project/#inheriting-plugin-definitionshttps://docs.meltano.com/concepts/plugins/#load_schema-extra