joao_paulo_amaral
06/22/2022, 2:10 PMken_payne
06/23/2022, 2:27 PMmeltano run
CLI and (optionally) a long-running instance of the Meltano UI web service. As Meltano is typically executed for short-lived jobs, and given the UI is optional at this stage, there isn't too much to consider in terms of HA of Meltano; as long as your scheduler, state database and workers are available, Meltano will be able to execute.
⢠A minimal Airflow install would include installing meltano
into your Airflow environment (ideally with pipx
) and calling meltano run
using the Airflow BashOperator. For ourselves, we use the KubernetesPodOperator to launch a pod from a dedicated Meltano image, to avoid handling dependencies between Airflow and Meltano in the same environment. We provide a DAG Generator as part of the Airflow file bundle that can automatically create Airflow DAGs from your Meltano schedules (using the meltano schedule list --format=json
command).
⢠Right now there is no easy way to execute Meltano run commands to a remote service/container via an API from Airflow, though this is definitely a paradigm we are thinking about. We'd love to get your thoughts and feedback if this is something that you'd like to see built!
Hope that helps šjoao_paulo_amaral
06/23/2022, 4:13 PMmichel_ebner
07/15/2022, 6:39 AMmichel_ebner
08/10/2022, 8:40 AMken_payne
08/10/2022, 11:10 AMCan anyone else validate the drawing ?Looks good! This is how we run Meltano in our own Squared project š
Where and when do I automatically generate and give access to the meltano DAGs to Airflow?There are a couple of ways to do this. The default way, using the DAG Generator shipped with
files-airflow
, is to install meltano
into your Airflow environment so that the DAG Generator can call meltano schedule list
to create DAGs from schedules. Another possibility is to save the json output of that list
command into your project during CI/CD. This is the approach we take in the Squared project, using a modified DAG Generator in our orchestrate/dags
folder, as we find it is faster to read a cached json than to regenerate one on every DAG refresh (Airflow refreshes DAGs every 60s).
Should I still redeploy the meltano long running pod on every image change?We redeploy our entire stack (Airflow, Meltano and Superset) on every image change, for simplicity and repeatability. If you have a larger team or many changes per day (making the deploy time painful), an approach with persistent volumes and a git sidecar (or similar syncing method) might make more sense.
michel_ebner
08/10/2022, 12:04 PM