adam_rudd
10/04/2021, 10:51 PM- Primary meltano instance runs all pipelines and jobs for all the things
- Teams adopt taps/targets their specific configurations
- Those taps/targets are added to the main instance and specific pipelines with (inherited) configs for that team so taps
- centralised logging, alerting etc
- always up - configurations mounted to some location (eg a deployment connected to a github repo for the yml )
pros:
- observability centralised
- makes the ui useful
- lower overhead for maintenance
cons:
- build failures / job failures may impact other jobs (? I havent read up on this)
- higher associated risks for failure of system
- meatier instances required
^ is this scenario best practice? When working with our current k8s infra, I’m feeling the pull of deploying single pipelines and across multiple pods, since I can just fire them up via cron (below)
02. Distributed scenario
- Multiple meltano instances created for each pipeline (eg Bigquery -> S3 pipeline)
- each instance controls a single requirement for data movement.
- state still associated with some external db as above. Possibly in separate tables
- option to basically run on demand - using cron to trigger the jobs
pros:
- lighter, simpler to deploy, can trigger via cli commands with basic cron jobs
- risk distributed across multiple instances. If stuff breaks on 1 instance, others are unaffected
cons:
- Observability will need to be across x instances deployed (additional work to get this)
- UI basically useless since there will likely be only 1 pipeline/job in each
Best practice feels like centralised from a long term perspective, but from a POC/MVP perspective i’m feeling like it’d be much easier to deploy with scenario 02.
Thoughts and feedback much appreciated here.visch
10/05/2021, 11:59 AMvisch
10/05/2021, 12:00 PMedgar_ramirez_mondragon
10/05/2021, 2:45 PMmeltano elt ...
commands. It is nice until it isn't and a job fails and you miss having the history and UI of something like Airflow. So, like @visch said, if I had the resources to leverage something like Airflow + K8s operators, I'd go for that.adam_rudd
10/05/2021, 9:37 PMtaylor
10/06/2021, 4:35 PMadam_rudd
10/06/2021, 8:04 PM