Is anyone using Meltano with AWS CloudWatch Events...
# infra-deployment
f
Is anyone using Meltano with AWS CloudWatch Events / EventBridge as the orchestrator? I know it is not a real official orchestrator that is integrated into Meltano, but Airflow on k8s, even if it is EKS, is a bit more than we may want to manage. I am thinking that, since Lambda is essentially running a docker container anyway, that we can use scheduled events to either call Lambda functions directly or Step Functions to run pipeline jobs. So here's the idea: create a docker image for Lambda that takes as input (the event data passed to the function) job specs. This would list the git repo to pull, what tap, target, job_id, and any other parameters necessary. When the Lambda function, or Step Function, runs, it would run meltano with the appropriate command line, after setting up .env with appropriate secrets/credentials, etc. If it works then an official orchestration could be written in which Meltano spits out the appropriate Terraform to create the Step Functions / EventBridge configuration based on the schedule that Meltano maintains. Lambda functions obviously have a limited run-time. But this would only be the EL portion, they would not be doing much, if any, T. And, with Step Functions we can save state and run again if we are running out of time and not caught up. The first use-case on AWS' site for Step Functions is "Automate Extract, Transform, Load (ETL) process." Thoughts?
a
I can say from experience, the overall pattern is viable. In a past life, we ran this workflow:
Cloudwatch Events
trigger
Step Function
which runs an
ECS Fargate Task
defined by an
ECR
-backed docker image, with state stored in
S3
. Creds were backed by AWS Parameter Store, which mapped from the config ref to environment variables in the defined ECS task. The step functions also created a nice "clickable" invoke point from AWS Web Console, as well as providing a means for max runtime and retry logic.
What isn't in scope right now is to have Meltano orchestrate the above with a native understanding of Step Functions and the surrounding AWS complexities. That said, we are working on a terraform module which would (in theory) be able to accept a
meltano.yml
file (or similar Meltano project artifact) as its input. In theory, most of the logic you describe could be built into Terraform module. We're still exploring the details how much Meltano would/should need to understand AWS infra in order to enable scenarios like these.
Is this helpful at all? What do you think?
I would not currently recommend lambda as a mainstream/out-of-box execution method, for obvious timeout reasons and the problem which would be caused by long-running processes going into a permanent state of not being able to finish and/or progress beyond the next bookmarkable increment. ECS Fargate gives a similar on-demand, zero-always-on-cost profile, with a tradeoff of longer startup times. More discussion around Lambda are here if you are interested.
c
I use CW events as triggers. You can see my full stack here https://github.com/chriskl/meltano-fargate-cloudformation
You’re better off using AWS Batch though
Batch is by far the simplest way to run these kinds of batch Docker jobs
f
Cool! Thank you both for the feedback. Much appreciated!