HI Everyone, I have been active lately in the comm...
# getting-started
a
HI Everyone, I have been active lately in the community because I wanted to pitch using Meltano for something. I was thinking on using Meltano on AWS ECS or Fargate (Kinda wish there was more documentation on this but google skill failed me). If that was possible, using AWS IAM roles with Meltano environments to run different pipelines run for in separate containers for a very data siloed so we don't mix data for any project that runs pipeline or by mistake access something that should be access during the process. What are you thoughts? I know this high level so far, just want to know if any have tried any pitfalls or anything that I am not thinking of?
p
I havent personally used Meltano + ECS although I have used ECS separately and we do something similar to what it sounds like youre looking for using a custom Airflow dag generator that uses the KubernetesPodOperator. You'd have to check this but you should be able to get the role isolation you want by overriding the
taskRoleArn
when calling ECS to run your container, I think that will restrict the container from accessing anything it shouldnt. In terms of Meltano configuration it seems like using environments would work to pass the different roles around
I'm curious from this thread https://meltano.slack.com/archives/CMN8HELB0/p1658307475849349?thread_ts=1658257724.289329&cid=CMN8HELB0 how @rick_smit is doing it. It would be cool to see a generically useful version of this so others could use it too.
Also cross posting or moving this thread to #C01E3MUUB3J could be a good place to discuss this 😄
a
P funny thing that another post to learn more about Meltano and ECS of how things works. However feel free to cross post, I do feel they should be a generic way to set this thing up.
a
100% see how this can work. We are isolating meltano projects in a shell on a kubernetes pod. Network isolation as well would be perfect, might only take a small tweak to what we are doing now 🤔
a
@aaron_phethean, i am little bit lost on how to accomplish this. I am happy you see the value for data and network isolation.
a
Our workspaces get a meltano project each, a ‘task’ is an import or report or something else run by meltano, a ‘task’ is launched by a module similar to the Airflow operator. On the task it needs to do the following: • git clone + meltano install the project (or that is built into the container in advance) • Setup the environment for the ‘meltano run …’ (this is the trickier part to isolate, as not all properties can go into meltano.yml, we setup bash environment variables by our own shell launcher from our central tenant database , but they could be from kubernetes secrets) finally ‘meltano run’ • we also collect the logs back to the tenant owned workspace -> pipeline -> job, plus record events and job status Setting this up in an isolated way manually is straight forward enough - albeit you have a fair amount of kubernetes, or ECS, or Argo template, or terraform, or whatever platform work to do. Making it self service for anyone in an isolated way is pretty involved. Pretty stoked with what we've built now.
a
@aaron_phethean sounds pretty awesome; have you thought of using a secret manager for sharing environment variables. I was thinking if we can make docker files with arguments plus use a secret manager then we accomplish something in a self service-ish way.
a
Yeah, there are a few options, I think the main challenge would always be managing the secrets manager. In our database we store the secrets encrypted, based on the tap setting of king password. I suppose our shell is a kind of secrets manager as all it's doing with properties is putting them into the environment…. Way it works now, we can run locally or in kubernetes, if we were to create kubernetes secrets instead that would work, but not sure there's any nett gain. I recall there is an issue for meltano to support a secrets store, the benefit would be similar to our approach - you could manage secrets in one place, then run your meltano anywhere.
s
You can view an example of our deployment using AWS - ECS Fargate in this thread on #C01E3MUUB3J: https://meltano.slack.com/archives/C01E3MUUB3J/p1658761388902909