Hey all Hoping for some feedback on an AWS Batch s...
# infra-deployment
m
Hey all Hoping for some feedback on an AWS Batch setup using terraform. I'm somewhat new to terraform but found that batch seems like a well supported AWS tool, and have got it to the point where it runs and does it's thing. Wondering if anyone familiar would be able to add some refinement 😄 I've used the
tap-smoke-test
and
target-jsonl
to get it spinning, but also need to consider secrets etc once I get there https://github.com/mattarderne/meltano-batch
Something else that I've struggled with is deploying the meltano docker image. I ended up just going with an ubuntu docker image, which requires a valid
ENV MELTANO_DATABASE_URI=postgresql://<user>:<password>@<host>:<port>/<db>
in the Dockerfile. This may no longer be necessary, I just got it working that way and left it
Primarily needing help with the Terraform AWS Batch setup 😄
f
Do you have HashiCorp Vault? Even the free / OS version should work for this, but I'd do this: • Assign an IAM role for your AWS Batch job • Configure AWS Auth backend in Vault • Run Vault Agent as a one-shot (single execution, no need to keep it around), to pull secrets from Vault and populate the .env file • After Vault Agent runs, run your meltano job Yes, you can use AWS SecretsManager, but flat-out HashiCorp Vault is the best product for doing things like this.
v
Someone here was using chamber for secrets (via Amazon key management) , might be with a look, have to search for the thread but they had a good example
f
To expand on this, you can create a parameter for your AWS Batch job to tell it what it's role is (think of the role like a job_id). The AWS IAM Role the job authenticates to Vault with would give access to specific paths in Vault. So your secrets for job_id abc123 may be stored in kv/meltano/jobs/abc123, while the secrets for job_id xyz456 would be stored in kv/meltano/jobs/xyz456. That way your secrets are kept separate, and all your jobs won't have access to each-other's secrets. Of course you could define groups also, and give all jobs in a group access to some group path, to pull common secrets if desired (like the DB creds, which could/should be using Vault's database backend and creating dynamic secrets for your database). Just be sure to plan out how you want your access to be mapped out in Vault. If you plan it out you can have a single policy that gives access to the right secrets for the right roles based on metadata, and keep everything secure.
c
I suggest using Chamber to hydrate ssm vars into env vars
m
Great, thanks for the suggestions! Something to question - I've currently got an RDS instance that each AWS Batch job connects to when it runs, this currently supplies the tokens, and captures the run success/failure logs in the
jobs
table. Is there any shortcoming in just persisting the connection tokens in the RDS?
I've refined the repo to the point where it works very reliably, and could be entirely scripted. AWS Batch is pretty ideal. I haven't got around to secrets, but will hopefully get there this week.