I'm not sure if this will be useful to others, but...
# infra-deployment
j
I'm not sure if this will be useful to others, but I've put together an orchestrator utility to generate Kubernetes CronJob manifests from the meltano schedule of jobs. The assumptions about your meltano project are as follows: • Your meltano config defines jobs and those jobs appear in a schedule configuration • You'd like to run meltano jobs on a recurring basis using a container image you build • Your meltano project code is built into a container image which can be run in a kubernetes cluster • You want to manage the execution and tracking of jobs using kubernetes, not airflow • You are comfortable using kustomize overlays to customize your kubernetes pod spec The developer experience is something like this: • You commit changes to your meltano project repository • CI/CD runs, builds your container image based on this revision • CI/CD invokes this utility, kustomize base layer kubernetes manifests are generated • CI/CD has kubectl set up with a valid kubeconfig for your cluster • You
kubectl apply -k orchestrate/kubernetes/production/kustomize.yml
(a file you create ahead of time—see README for more details) and apply the manifests to your cluster My team found this to be the most lightweight way of provisioning kubernetes CronJobs for meltano jobs in our cluster in which the engineer writing new meltano jobs and configurations did not need to modify any infrastructure definitions in order to have their jobs run regularly. The kubernetes CronJob/Job tracking and logs were sufficient and worked very well for our needs (primarily Extract/Load), but this utility could be extended quite a bit to support a lot of use cases. Feedback is welcome, but not all use-cases are guaranteed to be addressed/supported. I hope this helps someone! https://github.com/AdWerx/meltano-kubernetes-ext
👌 2
💪 1
melty bouncy 2
p
This is really good @Josh Bielick. Thanks for sharing. Can you share how you guys are deploy Meltano in kubernetes?
m
this is really interesting - thank you for sharing! We are running Meltano in Kubernetes using Argo Workflows. I wrote tooling using the Hera SDK to scaffold out all the CronWorkflows and WorkflowTemplates and then generate the YAML manifests from them (which are synced to the live clusters by ArgoCD).
j
@Pramod Kumar let me know if this answers your question correctly or not: We run a managed postgresql server for meltano's backend state. We use the aforementioned plugin/utility
AdWerx/meltano-kubernetes-ext
to generate CronJob manifests during CI/CD and use
kubectl apply -k
in CI/CD to apply them to the cluster. CI/CD could also commit these generated manifests if we were aiming for something more GitOps-oriented. We don't run Meltano UI so there's no other kubernetes resources other than the CronJobs.
p
Thanks @Josh Bielick, that will for sure help to understand different patterns in which Meltano is being used. In my case, I am trying to run Meltano together with AirFlow. Read few pointers how to do that in one of old thread in this channel about. Will give it try. Thanks again for responding to my query..
🙌 1
👍 1
j
We use Airflow for a variety of scheduled data flows, but when thinking about how to run Meltano on a recurring basis, I found it much easier to just run it as a CronJob. We could probably run meltano as a pod via Airflow and the KubernetesPodExecutor, but from my perspective I was more comfortable with avoiding the complexity of airflow and just running CronJobs in the cluster.
🙏 1
j
Have anyone here considered using Dagster orchestrating Meltano, dbt, etc.? I mean - Dagster brings you end-to-end lineage, alerting, etc. It provides a concept of pipes - it can invoke an external service, which may be literally anything.