Looking for advice on running Meltano in conjuncti...
# best-practices
s
Looking for advice on running Meltano in conjunction with Prefect. My first instinct is to containerize Meltano rather than have a separate deployment. So there’d be 1 repo in Gitlab for Meltano (Containerized) and another repo for the Orchestration (say, Meltano-Flows). I see two options here: • Use Prefect Shell commands to run Meltano. This works fine if we use the Containerized image on our Dask worker, but it’s a challenge to test this locally by just running
python flows/my_flow.py
since we’re not running it in a docker container during development. • Use Prefect Docker tasks. This works well locally, but now we have the challenges of docker-in-docker on our Dask workers. Am I missing something? Anyone else have experience with Prefect + Meltano?
t
I don’t have experience with it but am keen to learn more about folks’ experiences
s
Going through some POC’s now, I can post an update once I get something working that doesn’t involve a ton of overrides/manual config.
t
@sam_werbalowsky how did the POC go with this?
s
I got it running with the prefect shell commands….probably the best thing to do is to just package the orchestration with the meltano configuration, for simplicity sake. That way everything would be in one image. For concurrency, I believe you’d have to run one tap on one dask worker. I don’t know if the airflow integration allows you to run one tap on multiple workers for parallel processing. I would have a file structure something like this for productionizing: • Meltano ◦ Meltano.yml ◦ Other meltano folders • Prefect ◦ Flows ▪︎ meltano_flow.py ◦ Tasks ▪︎ meltano_tasks.py (build task to intake tap, target, and job_id) • Dockerfile (configure to package everything together) The way I have our Prefect helper set up, each flow needs to have a path to the flow on git, so you could use this structure and have no problem. I do need to do a little work to configure the dask workers and kubernetes cluster (I was getting descheduled pods for long jobs). The other thing to work through is if a job suddenly terminates, how do we update the database in meltano (postgres) to know it failed. I was left with running jobs in some instances.