First timer with Meltano/Airflow here. Currently o...
# infra-deployment
b
First timer with Meltano/Airflow here. Currently our data engineering stack has following components 1. Stable Kubernetes Cluster (1.17) 2. Jenkins based CI/CD Pipeline for Helm based multi environment deployments of Micro Services to Kubernetes 3. Jenkins based CI/CD pipeline to publish docker images from github repository 4. Harbor repository for storing helm charts and docker images 5. Terraforming with Atlantis 6. RDS on AWS We have built our initial pipeline that deploys as custom docker image with our meltano app+files baked in. It works well with manual execution But now I wanted to build Meltano/Airflow pipelines as files, preferably stored in Github 1. I am looking for options to build a Dev and CI/CD cycle for my Meltano Data pipes using Airflow 2. How do I publish
meltano.yml
+ supporting files to existing/remote Airflow server and have it schedule something to be execute it; how does my code from github get pushed to Airflow
p
Cross linking the other discussion we had on this https://meltano.slack.com/archives/CMN8HELB0/p1645546536361949
@binoy_shah Check out https://gitlab.com/meltano/squared. The way that we do it is we deploy an Airflow and a Meltano image with the project files and all plugins pre-installed, then use the KubernetesOperator to run a single execution of the Meltano image as a pod with the particular command like
meltano elt tap-x target-y
. The DAGs are packaged in the Airflow image and are redeployed to the cluster on a merge to the git main branch. This way when we merge code our CICD process deploys the new Airflow DAGs, etc. and our new Meltano configurations. To assist this as well, we use a custom dag_generator (based off the original dag_generator thats added to your project with a
meltano add orchestrator airflow
) that uses a yaml file to define DAGs, but you can also use the default schedule feature. Our custom dag generator allows toggling between a local bash executor and kubernetes pod operator based on the environment your running in i.e. local dev vs prod deployed