Henning Holgersen
10/20/2022, 11:56 AMinclude_paths to keep things tidy (or possibly just distribute the messiness). This might not be possible, and it might be a really bad idea, but the basic idea is:
meltano.yaml : Declares the taps and targets themselves, but no config. Imports yamls in extractors/ and loaders/ folders.
L `extractors/server1.meltano.yaml`: Extractors that inherits from the ones declared in meltano.yaml, Adds some configurations like server name, password, etc.
L extractors/server1/data1.meltano.yaml : Extractor that inherits from the ones declared in the parent extractors folder, specifies exactly what datasets are extracted.
L loaders/loaders.meltano.yaml : Same as for extractors…
So basically, conceptually, there are three levels of config files:
• level 1: meltano.yaml basically declares the different taps, no config or anything, just a base definition
• level 2: Taps with configs for specific servers/services, useful when you have several instances of snowflake, postgres etc that you load from.
• level 3: Taps that inherits from a specific service, and specifies specific datasets/endpoints/etc to be loaded. This allows us to load different schemas/datasets from the same source on different cadences.
Only the “level 3” taps would actually be invoked.
I haven’t gotten this to work yet, I don’t know if it is even possible, and I might be really, really overthinking it. Are there easier ways to go about this? Although I’m using docker/prefect for the loads, I want to stick with a single meltano project if possible.pat_nadolny
10/20/2022, 3:25 PMinclude_paths allow you to really organize in whatever way your team prefers. The inheritance feature only installs the plugin once (previously each child got its own venv) so install time or space isn't an issue when spreading the config out like this. I've done what you're describing on a smaller level and I end up leaving a few configs at each level because some config settings are generic enough to cast across all children, or can be overridden if needed in a child, to avoid re-configuring the same value over and over. Although sometimes for readability its nicer to have the configs all set in the lowest level child.aaronsteers
10/20/2022, 6:09 PM/meltano.yml
/environments/dev.meltano.yml
/environments/staging.meltano.yml
/environments/prod.meltano.yml
/extractors/<domain1>.meltano.yml
/extractors/<domain2>.meltano.yml
/extractors/<domain2>.meltano.yml
/loaders/*.meltano.yml
...
Rather than breaking up extractors into tiers or levels of inheritance, I would try to keep them grouped together by topic area and/or by internal team name or function names.
For example, if you have several instances of tap-slack for connecting to different slack sites, putting them all in .../extractors/slack.meltano.yml will make debugging a lot easier when they are inheriting from each other - and if you need to change all of them, you can change them all in one place.
As the team and the repo grows, you can also group subfolders according to which team controls/maintains which files. So, hubspot and google analytics might be nested in a marketing subfolder, for instance - and then you can use CODEOWNERS to govern who is allowed to approve changes on each subfolder.aaronsteers
10/20/2022, 6:10 PMHenning Holgersen
10/20/2022, 8:12 PMchristoph
10/20/2022, 8:36 PMtarget-jsonl
2. One file per data source in extract/ folder. Each file specifies everything: Base plugin, inherited plugin (when needed), and common config (we hardly have to differentiate between dev/prod in any of the sources)
3. In the rare case that a "base" plugin might need to be inherited by multiple data sources (which hasn't happened yet), the base plugin will simply get its own file in the extract/ folder.
Point three highlights the other important principle we follow:
Refactor early and often
Henning Holgersen
10/20/2022, 8:57 PMinclude_paths .
Refactor early and often is good advice, I just hope we can keep it up as the project matures.christoph
10/20/2022, 9:01 PMSimplicity--the art of maximizing the amount
of work not done--is essential.