Hey all Hoping for some feedback on an AWS Batch setup using Meltano #infra-deployment

Hey all Hoping for some feedback on an AWS Batch s...

matt_arderne

11/17/2021, 7:51 PM

Hey all Hoping for some feedback on an AWS Batch setup using terraform. I'm somewhat new to terraform but found that batch seems like a well supported AWS tool, and have got it to the point where it runs and does it's thing. Wondering if anyone familiar would be able to add some refinement 😄 I've used the

tap-smoke-test

and

target-jsonl

to get it spinning, but also need to consider secrets etc once I get there https://github.com/mattarderne/meltano-batch

matt_arderne

11/17/2021, 7:52 PM

Something else that I've struggled with is deploying the meltano docker image. I ended up just going with an ubuntu docker image, which requires a valid

ENV MELTANO_DATABASE_URI=postgresql://<user>:<password>@<host>:<port>/<db>

in the Dockerfile. This may no longer be necessary, I just got it working that way and left it

matt_arderne

11/17/2021, 7:53 PM

Primarily needing help with the Terraform AWS Batch setup 😄

fred_reimer

11/18/2021, 12:54 AM

Do you have HashiCorp Vault? Even the free / OS version should work for this, but I'd do this: • Assign an IAM role for your AWS Batch job • Configure AWS Auth backend in Vault • Run Vault Agent as a one-shot (single execution, no need to keep it around), to pull secrets from Vault and populate the .env file • After Vault Agent runs, run your meltano job Yes, you can use AWS SecretsManager, but flat-out HashiCorp Vault is the best product for doing things like this.

visch

11/18/2021, 1:03 AM

Someone here was using chamber for secrets (via Amazon key management) , might be with a look, have to search for the thread but they had a good example

fred_reimer

11/18/2021, 1:03 AM

To expand on this, you can create a parameter for your AWS Batch job to tell it what it's role is (think of the role like a job_id). The AWS IAM Role the job authenticates to Vault with would give access to specific paths in Vault. So your secrets for job_id abc123 may be stored in kv/meltano/jobs/abc123, while the secrets for job_id xyz456 would be stored in kv/meltano/jobs/xyz456. That way your secrets are kept separate, and all your jobs won't have access to each-other's secrets. Of course you could define groups also, and give all jobs in a group access to some group path, to pull common secrets if desired (like the DB creds, which could/should be using Vault's database backend and creating dynamic secrets for your database). Just be sure to plan out how you want your access to be mapped out in Vault. If you plan it out you can have a single policy that gives access to the right secrets for the right roles based on metadata, and keep everything secure.

chris_kings-lynne

11/18/2021, 1:18 AM

I suggest using Chamber to hydrate ssm vars into env vars

chris_kings-lynne

11/18/2021, 1:18 AM

eg. https://github.com/chriskl/meltano-fargate-cloudformation

chris_kings-lynne

11/18/2021, 1:21 AM

Chamber is here: https://github.com/segmentio/chamber

matt_arderne

11/18/2021, 11:50 AM

Great, thanks for the suggestions! Something to question - I've currently got an RDS instance that each AWS Batch job connects to when it runs, this currently supplies the tokens, and captures the run success/failure logs in the

jobs

table. Is there any shortcoming in just persisting the connection tokens in the RDS?

matt_arderne

11/22/2021, 6:37 PM

I've refined the repo to the point where it works very reliably, and could be entirely scripted. AWS Batch is pretty ideal. I haven't got around to secrets, but will hopefully get there this week.

Open in Slack

Previous Next