Hi. I initialized a meltano project via dockerized...
# infra-deployment
e
Hi. I initialized a meltano project via dockerized meltano image, added extractors and targets. I also added airflow as the orchestrator. Everthing seems fine in local but I wonder how to deploy it into a server environment so that I wont need to run it in my machine. I searched for some info, I may use aws ecs or ec2 to use meltano in a server. In my local environment there are two running containers now, airflow scheduler and airflow ui. But I did not understand some concepts. In https://docs.meltano.com/guide/containerization, there is some info for running meltano in containers. If I build a new docker image, and register it in aws ecr, then create a task definition from my meltano project's image, and run two tasks (one for airflow ui and one for airflow scheduler) from this task definition, will everything be fine? You may say "just try it" but I am not sure this approach is the best practise 🙂 So I need to know to setup this infrastructure with the proper deployment process. I also dont know how to create a CI/CD workflow. I will be working on it to automate dev-deploy workflow. Lastly, I am using target-clickhouse as a loader now. My clickhouse db is in an aws fargate service. Can I use dbt with clickhouse? If so, can I install dbt to make transformations in the meltano project, or should I try to use dbt cloud to transform data outside the project?
1
e
Hi @Emre Üstündağ! I'll try to answer each of your questions:
If I build a new docker image, and register it in aws ecr, then create a task definition from my meltano project's image, and run two tasks (one for airflow ui and one for airflow scheduler) from this task definition, will everything be fine? You may say "just try it" but I am not sure this approach is the best practise
That should work. As for what is "best practice", if you're on AWS you might wanna use a native Airflow deployment with MWAA and orchestrate the Meltano task with the ECR operator.
So I need to know to setup this infrastructure with the proper deployment process. I also dont know how to create a CI/CD workflow.
That depends somewhat on what platform you're developing on and whether you're familiar with IaC tools like Terraform. I'm sure there are guides out there on how to do CI/CD for AWS infra using Terraform.
Lastly, I am using target-clickhouse as a loader now. My clickhouse db is in an aws fargate service. Can I use dbt with clickhouse?
Yeah, although you'll have to create a custom plugin definition in your
meltano.yml
with a pip URL that points to both meltano-dbt-ext and dbt-clickhouse.
should I try to use dbt cloud to transform data outside the project?
That's honestly another option if you'd like having the dbt Cloud UI.
e
Thank you @Edgar Ramírez (Arch.dev). I am aware that my questions were a bit long. Still you answered patiently, so I am appreciated. In fact these questions were my cost-effective concerns. Anyway, I will be trying your suggestions, most likely I will face different challenges during deployment :)
e
most likely I will face different challenges during deployment 🙂
As expected with this sort of thing, hard to get exactly right the first time but feel free to ask in #C0699V48BPF, #C069CSV7NHY or even #C069CQNHDNF.
e
Yes you're absolutely right. Thank you
Hi. I deployed my local meltano project to aws ecs. For now, as a scheduler I have tried using task scheduler of aws ecs and it is working for each meltano command. Of course I did not configure system db or state backend yet so incremental replication is not being read from any db like .meltano.db in local project. I am working on this but there are some topics I wonder. If I create a PostgreSQL ecs up and running task and config database uri, will it be enough to track historical data so that each run will be counted as incremental? Or do I need to config additional settings such as systemdb etc. considering to this link: https://docs.meltano.com/guide/advanced-topics I'd appreciate it any other suggestions to run meltano in cloud in a simple way, track metadata like in local?
Edit: I may have understood the concepts. I have created a postgres ecs instance to config meltano systemdb and s3 for state backend. And I tested both of them. Now I am able to use meltano in a server environment by running tasks in ecs. The development is lasting so I think scheduling tasks in ecs is fairly enough for orchestration for my use case because we need a quick and ready-to-use data for some reasons. If I need anything for next steps, I will be sharing. Thanks again
e
That's awesome! Let me know if you need to setup logging, I had success setting up a firelens sidecar to forward Meltano logs in the past