My `meltano.yml` is getting very large, is there a...
# best-practices
e
My
meltano.yml
is getting very large, is there a recommended practice for breaking it up into smaller files that can be referenced/imported?
h
you can split up your meltano plugin specification across multiple files. In the main file use include_paths like below
Copy code
include_paths:
  - "./plugin_definitions1/*.meltano.yml"
  - "./plugin_definitions2/*.meltano.yml"
e
thanks!
it looks like there's currently no way to use
include_paths
within a plugin?
h
what do you mean?
e
for example, replacing the
select:
within my extractor with a reference to a file
e
Hi @Ellis Valentiner! > replacing the
select:
within my extractor with a reference to a file That last bit is not currently possible, but I'd like to explore the option of publishing a pkl module for programmatically building meltano.yml, so a user could split their project and reuse as much they'd need
h
Hmm. Could you help me understand the use case for looking up the select block from a different file.
e
We are replicating dozens of tables, most of which have dozens of columns. We can use the
table.*
syntax for some, but would prefer to declaratively list each column to be included/excluded. We want don't want to include new columns automatically for data privacy reasons and we've encountered problems replicating when columns are added to the underlying tables. Listing each column leads to a very large
meltano.yml
file. Currently we have only 1 extractor and 1 loader configuration but we expect to change this as well. We'd prefer to store information in separate files (with some sort of directory structure) that could be imported and reused in different
select
blocks for different extractor/loader tasks.
h
I see. So would the ergonomic goal be ability to specify each table in its own file?
e
Yes. I've been thinking about this and one of the challenges we've had in developing our ELT is that adding new tables requires us updating the
stream_maps
,
select
, and
metadata
blocks of our extractor. I can see use cases where that would be desirable but we'd rather have a single, centralized place to manage these.
I mean that we would like to model the entity/table we are replicating in a single place, so that the stream maps, selects, and metadata configuration were all stored together.
e
I think at this point your best option for that would be to use a proper programming language to generate your
meltano.yml
. In the long term, either a pkl module or a proper client (in Python or any language) generated from the JSON schema seems like the best solution, rather than adding complexity to Meltano's yaml parsing and resolution capabilities. Do log an issue if you'd like to see something like that or even something else entirely 🙂
s
@Edgar Ramírez (Arch.dev) - when you mean use programming language to generate Meltano.yml . Do you mean split the file into extractor base and merge them into 1 when running .
e
I mean using something like Python to generate the desired objects and serializing them to yaml
👍 1
m
It’s possible to use YAML anchors and aliases in meltano.yml files. I’ve used this to reduce duplicated config before (although, updates made by meltano to the YAML file will expand them all, which made it awkward to manage).
👍 1
e
fwiw I did get around to trying pkl for modularizing `meltano.yml`: https://github.com/edgarrmondragon/meltano-dogfood/tree/main/pkl.
ty 1
s
exactly what I was trying to figure out.