Hello team :wave: We are exploring using Meltano ...
# random
g
Hello team 👋 We are exploring using Meltano as the backend for moving customers' data from their premises to ours. Do you think Meltano is the right tool for this? Would it scale to thousand of customers with gigabytes of data each? How should we go about it? A Meltano project per customer? Happy to read any documentation/articles your could point me at. Thanks!
t
Meltano is a good choice for this 🙂 There are a few ways you could do this. One Meltano project per customer is a potential solution, but we’ve also seen folks use plugin inheritance to overwrite a single variable for each customer. Saves on build time and keeps the config fairly tidy.
g
interesting! thanks for the pointer
t
I’m pinging the engineering team internally as well b/c we definitely have a few folks in the community who have this sort of architecture but I’m having trouble remembering exactly who (I blame this cough I got from Coalesce…)
a
Welcome, @gerard_clos. Another option recently opened up - for scenarios where you want basically the same process repeated for different customers - you can write the pipeline once and use the new
--state-id-suffix
to have a separate set of state trackers for each client. Do each of the clients have a similar/same schema and source DBs? https://docs.meltano.com/concepts/environments#state-id-suffix
Cc @Reuben (Matatika) who contributed the above feature. 🙂
g
Hi @aaronsteers, The destination would be the same for all customers but not the source, customers can customise their data source and that could be anything from G Sheets API to a PostgresDB
a
Hi @gerard_clos - we create a project per customer and the customer manages the settings in our world. The state id suffix became necessary for us as there are multiple sources of the same type. I’d be happy to show you what we do, ping me a DM if you like.
One more thing. GBs of data not an issue. State means that your loads can restart / pick up where they left off without issue. There's some tuning and memory considerations of course. Batch feature coming soon should be a step change in sync performance with really large volumes.