Does anyone have advice for keeping config DRY? I ...
# getting-started
e
Does anyone have advice for keeping config DRY? I have many single tenant databases and am using the
tap-postgres
extractor and
inherit_from
to define an extractor for each database but it is rather tedious and duplicative.
v
Hard to know exactly without more deets. How many tenant dbs? Can you show an example
meltano.yml
of what's bothering you?
e
Here's what my
meltano.yml
looks like so far:
Copy code
version: 1
default_environment: dev
project_id: 87c29cde-3184-41dc-a242-f29e823e8926
environments:
- name: dev
- name: staging
- name: prod
plugins:
  # Extractors
  extractors:
  # This is the default extractor for the `tap-postgres` tap
  - name: tap-postgres
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-postgres.git>
    config:
      flattening_enabled: True
      flattening_max_depth: 5
      filter_schemas:
      - public
      stream_maps:
        public-Alarm:
          source: '""'
    select:
    - public-Alarm.*
  # Begin customer-specific extractors
  - name: tap-postgres--customer1
    inherit_from: tap-postgres
    config:
      database: customer1
      stream_maps:
        public-Alarm:
          source: '"customer1"'
  - name: tap-postgres--customer2
    inherit_from: tap-postgres
    config:
      database: customer2
      stream_maps:
        public-Alarm:
          source: '"customer2"'
  - name: tap-postgres--customer3
    inherit_from: tap-postgres
    config:
      database: customer3
      stream_maps:
        public-Alarm:
          source: '"customer3"'
  - name: tap-postgres--customer4
    inherit_from: tap-postgres
    config:
      database: customer4
      stream_maps:
        public-Alarm:
          source: '"customer4"'
  - name: tap-postgres--customer5
    inherit_from: tap-postgres
    config:
      database: customer5
      stream_maps:
        public-Alarm:
          source: '"customer5"'
  - name: tap-postgres--customer6
    inherit_from: tap-postgres
    config:
      database: customer6
      stream_maps:
        public-Alarm:
          source: '"customer6"'
  - name: tap-postgres--customer7
    inherit_from: tap-postgres
    config:
      database: customer7
      stream_maps:
        public-Alarm:
          source: '"customer7"'
  - name: tap-postgres--customer8
    inherit_from: tap-postgres
    config:
      database: customer8
      stream_maps:
        public-Alarm:
          source: '"customer8"'
  - name: tap-postgres--customer9
    inherit_from: tap-postgres
    config:
      database: customer9
      stream_maps:
        public-Alarm:
          source: '"customer9"'
  - name: tap-postgres--customer10
    inherit_from: tap-postgres
    config:
      database: customer10
      stream_maps:
        public-Alarm:
          source: '"customer10"'
  # Loaders
  loaders:
  # This is the default loader for the `target-duckdb` target
  - name: target-duckdb
    variant: jwills
    pip_url: target-duckdb~=0.6
    config:
      filepath: ./output/warehouse.duckdb
      default_target_schema: main
  # Transformers
  transformers:
  - name: dbt-duckdb
    variant: jwills
    pip_url: dbt-core~=1.6.0 dbt-duckdb~=1.6.0
    config:
      path: ./output/warehouse.duckdb
  # Utilities
  utilities:
  - name: superset
    variant: apache
    pip_url: apache-superset==3.0.0 duckdb-engine==0.9.2
    config:
      WTF_CSRF_ENABLED: False
jobs:
- name: all
  tasks:
  - tap-postgres--customer1 target-duckdb
  - tap-postgres--customer2 target-duckdb
  - tap-postgres--customer3 target-duckdb
  - tap-postgres--customer4 target-duckdb
  - tap-postgres--customer5 target-duckdb
  - tap-postgres--customer6 target-duckdb
  - tap-postgres--customer7 target-duckdb
  - tap-postgres--customer8 target-duckdb
  - tap-postgres--customer9 target-duckdb
  - tap-postgres--customer10 target-duckdb
Ideally I'd like a more succinct way to configure extractors, e.g. by passing a list of databases
For only 10 customers, this is not so bad. But this won't work for enumerating each of our customer databases. I suppose I could generate the
meltano.yml
from a script?
v
Yeah I get it, once you're past 3-5 of those overrides its not really the best tool. I think ENV vars would get you what you're after and the complexity trade off is to the orchestrator so the meltano.yml would look something like
Copy code
version: 1
default_environment: dev
project_id: 87c29cde-3184-41dc-a242-f29e823e8926
environments:
- name: dev
- name: staging
- name: prod
plugins:
  # Extractors
  extractors:
  # This is the default extractor for the `tap-postgres` tap
  - name: tap-postgres
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-postgres.git>
    config:
      flattening_enabled: True
      flattening_max_depth: 5
      filter_schemas:
      - public
      stream_maps:
        public-Alarm:
          source: '""'
    select:
    - public-Alarm.*
  # Begin customer-specific extractors
  - name: tap-postgres--customer
    inherit_from: tap-postgres
    config:
      database: ${CUSTOMER_NUMBER}
      stream_maps:
        public-Alarm:
          source: ${CUSTOMER_NUMBER}
  # Loaders
  loaders:
  # This is the default loader for the `target-duckdb` target
  - name: target-duckdb
    variant: jwills
    pip_url: target-duckdb~=0.6
    config:
      filepath: ./output/warehouse.duckdb
      default_target_schema: main
  # Transformers
  transformers:
  - name: dbt-duckdb
    variant: jwills
    pip_url: dbt-core~=1.6.0 dbt-duckdb~=1.6.0
    config:
      path: ./output/warehouse.duckdb
  # Utilities
  utilities:
  - name: superset
    variant: apache
    pip_url: apache-superset==3.0.0 duckdb-engine==0.9.2
    config:
      WTF_CSRF_ENABLED: False
jobs:
- name: all
  tasks:
  - tap-postgres--customer target-duckdb
Then in your orchestrator you'd set the env var
CUSTOMER_NUMBER=2
,
CUSTOMER_NUMBER=3
etc which gets you to what you're after. Now the only thing I"m not 100% sure of is if the nested object
stream_maps
would expand the
env
properly I know the
database
config would work properly
The other thing is do you care about incremental and state being stored properly if so this will need some tweaks
e
Thanks, this looks more like what I want.
v
Some folks say they are going to have "100"'s of clients etc, I tell everyone to just start with inherit from until you actually hit 5+ then swap it
e
For orchestration, would you recommend using the built-in Airflow? We've had better luck using other tooling.
v
whatever tool your team currently uses is what I'd recommend!
otherwise github actions, gitlab, cron, etc