Hello team wave We are exploring using Meltano as the backen Meltano #random

Hello team :wave: We are exploring using Meltano ...

gerard_clos

10/25/2022, 1:32 PM

Hello team 👋 We are exploring using Meltano as the backend for moving customers' data from their premises to ours. Do you think Meltano is the right tool for this? Would it scale to thousand of customers with gigabytes of data each? How should we go about it? A Meltano project per customer? Happy to read any documentation/articles your could point me at. Thanks!

taylor

10/25/2022, 2:24 PM

Meltano is a good choice for this 🙂 There are a few ways you could do this. One Meltano project per customer is a potential solution, but we’ve also seen folks use plugin inheritance to overwrite a single variable for each customer. Saves on build time and keeps the config fairly tidy.

gerard_clos

10/25/2022, 2:25 PM

interesting! thanks for the pointer

taylor

10/25/2022, 2:26 PM

I’m pinging the engineering team internally as well b/c we definitely have a few folks in the community who have this sort of architecture but I’m having trouble remembering exactly who (I blame this cough I got from Coalesce…)

aaronsteers

10/25/2022, 2:29 PM

Welcome, @gerard_clos. Another option recently opened up - for scenarios where you want basically the same process repeated for different customers - you can write the pipeline once and use the new

--state-id-suffix

to have a separate set of state trackers for each client. Do each of the clients have a similar/same schema and source DBs? https://docs.meltano.com/concepts/environments#state-id-suffix

aaronsteers

10/25/2022, 2:30 PM

Cc @Reuben (Matatika) who contributed the above feature. 🙂

gerard_clos

10/25/2022, 2:36 PM

Hi @aaronsteers, The destination would be the same for all customers but not the source, customers can customise their data source and that could be anything from G Sheets API to a PostgresDB

aaron_phethean

10/25/2022, 9:17 PM

Hi @gerard_clos - we create a project per customer and the customer manages the settings in our world. The state id suffix became necessary for us as there are multiple sources of the same type. I’d be happy to show you what we do, ping me a DM if you like.

aaron_phethean

10/25/2022, 9:33 PM

One more thing. GBs of data not an issue. State means that your loads can restart / pick up where they left off without issue. There's some tuning and memory considerations of course. Batch feature coming soon should be a step change in sync performance with really large volumes.

Open in Slack

Previous Next