Hi community I am a big fan of separation of concerns and to Meltano #best-practices

Hi community, I am a big fan of “separation of con...

magnus_avitsland

08/29/2022, 7:43 AM

Hi community, I am a big fan of “separation of concerns”, and to this I have a Q: • Do I separate different extractors by the different plugins and their configs in the same meltano project, ie: ◦ cd big-meltano-project ▪︎ meltano run tap-google-sheet--sheet1 target-bigquery ▪︎ meltano run tap-google-sheet--sheet2 target-bigquery ▪︎ meltano run tap-some-totally-different-plugin--instance1 target-bigquery ▪︎ meltano run tap-some-totally-different-plugin--instance2 target-bigquery • Or: Do I separate this into different meltano projects whatsoever, ie: ◦ cd meltano-project-google-sheet-1 ▪︎ meltano run tap-google-sheet target-bigquery ◦ cd meltano-project-google-sheet2 ▪︎ meltano run tap-google-sheet target-bigquery What is the best practise for this? (I run everything in GKE K8 orchestrated through GCP Composer through GKEStartPodOperator) Thanks in advance :)

aaronsteers

08/29/2022, 5:54 PM

Just my personal take here, but I want the project to be a full list of all things in the dependency tree. If none of your EL pipelines have downstream consumers, and each is atomic, then I think a separate project per pipeline is okay - although probably hard to keep DRY if you have interdependent settings and backends for storage. For most projects, there generally tends to be a lot of reuse and merging, cross-source comparisions, and cross-source lookups - in which case you are generally best served with a single project that encompasses "everything related to each other" in a single project. So, it could be typical to have a single project cover the entirety of a data warehouses inputs, outputs, transformations, validations, and reports. Still, within a single project framing, and to address "separation of concerns", you can still have different CODEOWNERs for different subfolders, and one team can own the

extractors

folder while another team owns

transforms/sales

and other team owns

transforms/finance

. By keeping these in the same project, you are more likely to detect when a change to extractors breaks a downstream transform, for instance, even if different teams are managing each.

aaronsteers

08/29/2022, 5:54 PM

Those are just my personal thoughts. Others surely will have different takes. Hope it helps!

magnus_avitsland

08/31/2022, 2:36 PM

thanks, it did help, we’re starting off with a single project, with different versions, ie. inherited plugins for the different sources. We did not check the subfolder thing yet, ie. /extractors/sales etc.

magnus_avitsland

09/01/2022, 1:51 PM

I will play around with how we work with diff sources using the same plugin in some time. Ie. Put in extractor/subfolder-google-sheet1 • Use plugin w/ inheritance ie. tap-google-sheet—sheet1 and tap-google-sheet—sheet2 TBD….. thx again for input :)

2 Views

Open in Slack

Previous Next