Hi All How would you recommend doing multi tenant user dags Meltano #getting-started

Hi All, How would you recommend doing multi-tenant...

lior_shkiller

07/24/2021, 7:57 AM

Hi All, How would you recommend doing multi-tenant/user dags using Meltano? We have a use case that we would like to create an ELT process for each user of ours (can scale to thousands). Each user will have different configurations. We thought about using Meltano because of the ease of use but it seems like it's a bit of a workaround because it would require us to duplicate the configuration process for each user. It seems like implementing a for-loop in an airflow dag and then running subdags is a more appropriate solution. Happy to hear what you think

aaronsteers

07/24/2021, 4:30 PM

To clarify, do the users need to configure their own dags, and are the dags (extract, loads, and transforms) different per user, or is it basically the same steps but with different connections and configurations?

aaronsteers

07/24/2021, 4:32 PM

If similar/same dag per user, there's an emerging pattern of using something like AWS Parameter store to create distinct trees of config, which then can be hydrated at runtime by a tool like chamber - or any other automation strategy which can inject environment variables.

lior_shkiller

07/24/2021, 6:19 PM

Thanks for the response. Its a similar dag per user with different connections/config. We are using GCP though. Also, we need to be able to scale it well and manage the users so we thought to read the configurations for each user from a DB.

shagility__agiledata.io_

07/25/2021, 12:58 AM

@lior_shkiller we are currently deploying Singer.io taps on GCP as part of AgileData.io. We are keen to move to Meltano, our current plan is to use a DB to hold the Meltano config for each tenancy (customer). We have benefit that we already hold config in a DB as a core part of our platform, so we will just reuse that component. At the moment we hydrate the Singer.io instances on demand when we need to do a load and then blow them away, that way we are not incurring the cost of containers running all the time when they would only be idle. Next step for us is to do a McSpikey (research spike) to prove our current thinking. Ill prioritse that sometime for some time in August. Ill try and remember to post our findings from the spike.

aaron_phethean

08/03/2021, 2:09 PM

@shagility__agiledata.io_ I like what you are doing with AgileData.io - we've implemented something similar to your idea of holding the per tenant config and invoking meltano with that environment. Hoping to make this available to a wide audience next month.

Open in Slack

Previous Next