I was wondering if the community has suggestions o...
# best-practices
j
I was wondering if the community has suggestions on how to best architect a cloud ETL platform using: 1. Meltano as the ELT tool 2. Dagster as the wrapper around Meltano (and potentially using the upcoming integration as well) 3. Deployment in AWS ECS/Fargate cluster The reason for using Dagster as a wrapper is because the config for each tap source needs to be dynamically refreshed each day before the ELT pipelines are run. Specific questions: 1. What containers/tasks would you setup in ECS/Fargate? What sizes? 2. How would you manage CI/CD with updates to the Meltano and/or Dagster configuration? 3. Would you use the Dagster integration with Meltano in addition to the Dagster wrapper? 4. How much parallelization is recommended among pipeline runs?
My initial thoughts: 1. ECS containers/tasks for a. Dagster Dagit instance (.25 vCPU, .5 GB Mem) b. Dagster Daemon (.25 vCPU, .5 GB Mem) c. Meltano (.25 vCPU, .5 GB Mem) 2. Deploy source code to EFS mounted to the Meltano container and refresh it immediately on every git pull request completion 3. Probably not necessary to use integration because the Dagster wrapper will already be scheduling runs 4. Parallelize as much as possible, because why not? limitation may need to be considered for cases where too many connections to taps/targets could cause problems
d
we’re going through this exercise right now. we’ve setup a deployment pipeline via github actions for meltano that simply builds + publishes a new meltano project image to ECR and updates the ECS task definition accordingly. All dagster will be aware aware of is the task family that it needs to trigger via ECS
j
Glad to hear someone else is doing something similar!!! I thought about that approach. My hesitancy with it was the daily Meltano config refresh that has to happen before the pipeline runs. It is, in effect, a change to the source code that we would not be tracking in git. So, if I’m not mistaken, I would have to build, push, etc. a new image each day. That didn’t seem terribly efficient …
e
You may be able to set your dynamic tap configs via environment variables. So basically the ECS equivalent of
docker run -e "TAP_CONFIG_VAR=<SOME DYNAMIC VALUE>"
j
I am doing that, but unfortunately it's the select expressions that need to be updated
I guess I haven't looked to see if you can provide select expressions in an environment variable ... that would totally solve my problem. Does anyone know if you can do that?
Well I guess I should read the docs more ... I'll answer my own question https://meltano.com/docs/plugins.html#select-extra