Does anyone have advice for keeping config DRY I have many s Meltano #getting-started

Does anyone have advice for keeping config DRY? I ...

ellis_valentiner

10/10/2023, 1:41 PM

Does anyone have advice for keeping config DRY? I have many single tenant databases and am using the

tap-postgres

extractor and

inherit_from

to define an extractor for each database but it is rather tedious and duplicative.

visch

10/10/2023, 1:53 PM

Hard to know exactly without more deets. How many tenant dbs? Can you show an example

meltano.yml

of what's bothering you?

ellis_valentiner

10/10/2023, 2:25 PM

Here's what my

meltano.yml

looks like so far:

Copy code

version: 1
default_environment: dev
project_id: 87c29cde-3184-41dc-a242-f29e823e8926
environments:
- name: dev
- name: staging
- name: prod
plugins:
  # Extractors
  extractors:
  # This is the default extractor for the `tap-postgres` tap
  - name: tap-postgres
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-postgres.git>
    config:
      flattening_enabled: True
      flattening_max_depth: 5
      filter_schemas:
      - public
      stream_maps:
        public-Alarm:
          source: '""'
    select:
    - public-Alarm.*
  # Begin customer-specific extractors
  - name: tap-postgres--customer1
    inherit_from: tap-postgres
    config:
      database: customer1
      stream_maps:
        public-Alarm:
          source: '"customer1"'
  - name: tap-postgres--customer2
    inherit_from: tap-postgres
    config:
      database: customer2
      stream_maps:
        public-Alarm:
          source: '"customer2"'
  - name: tap-postgres--customer3
    inherit_from: tap-postgres
    config:
      database: customer3
      stream_maps:
        public-Alarm:
          source: '"customer3"'
  - name: tap-postgres--customer4
    inherit_from: tap-postgres
    config:
      database: customer4
      stream_maps:
        public-Alarm:
          source: '"customer4"'
  - name: tap-postgres--customer5
    inherit_from: tap-postgres
    config:
      database: customer5
      stream_maps:
        public-Alarm:
          source: '"customer5"'
  - name: tap-postgres--customer6
    inherit_from: tap-postgres
    config:
      database: customer6
      stream_maps:
        public-Alarm:
          source: '"customer6"'
  - name: tap-postgres--customer7
    inherit_from: tap-postgres
    config:
      database: customer7
      stream_maps:
        public-Alarm:
          source: '"customer7"'
  - name: tap-postgres--customer8
    inherit_from: tap-postgres
    config:
      database: customer8
      stream_maps:
        public-Alarm:
          source: '"customer8"'
  - name: tap-postgres--customer9
    inherit_from: tap-postgres
    config:
      database: customer9
      stream_maps:
        public-Alarm:
          source: '"customer9"'
  - name: tap-postgres--customer10
    inherit_from: tap-postgres
    config:
      database: customer10
      stream_maps:
        public-Alarm:
          source: '"customer10"'
  # Loaders
  loaders:
  # This is the default loader for the `target-duckdb` target
  - name: target-duckdb
    variant: jwills
    pip_url: target-duckdb~=0.6
    config:
      filepath: ./output/warehouse.duckdb
      default_target_schema: main
  # Transformers
  transformers:
  - name: dbt-duckdb
    variant: jwills
    pip_url: dbt-core~=1.6.0 dbt-duckdb~=1.6.0
    config:
      path: ./output/warehouse.duckdb
  # Utilities
  utilities:
  - name: superset
    variant: apache
    pip_url: apache-superset==3.0.0 duckdb-engine==0.9.2
    config:
      WTF_CSRF_ENABLED: False
jobs:
- name: all
  tasks:
  - tap-postgres--customer1 target-duckdb
  - tap-postgres--customer2 target-duckdb
  - tap-postgres--customer3 target-duckdb
  - tap-postgres--customer4 target-duckdb
  - tap-postgres--customer5 target-duckdb
  - tap-postgres--customer6 target-duckdb
  - tap-postgres--customer7 target-duckdb
  - tap-postgres--customer8 target-duckdb
  - tap-postgres--customer9 target-duckdb
  - tap-postgres--customer10 target-duckdb

ellis_valentiner

10/10/2023, 2:27 PM

Ideally I'd like a more succinct way to configure extractors, e.g. by passing a list of databases

ellis_valentiner

10/10/2023, 2:30 PM

For only 10 customers, this is not so bad. But this won't work for enumerating each of our customer databases. I suppose I could generate the

meltano.yml

from a script?

visch

10/10/2023, 2:31 PM

Yeah I get it, once you're past 3-5 of those overrides its not really the best tool. I think ENV vars would get you what you're after and the complexity trade off is to the orchestrator so the meltano.yml would look something like

Copy code

version: 1
default_environment: dev
project_id: 87c29cde-3184-41dc-a242-f29e823e8926
environments:
- name: dev
- name: staging
- name: prod
plugins:
  # Extractors
  extractors:
  # This is the default extractor for the `tap-postgres` tap
  - name: tap-postgres
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-postgres.git>
    config:
      flattening_enabled: True
      flattening_max_depth: 5
      filter_schemas:
      - public
      stream_maps:
        public-Alarm:
          source: '""'
    select:
    - public-Alarm.*
  # Begin customer-specific extractors
  - name: tap-postgres--customer
    inherit_from: tap-postgres
    config:
      database: ${CUSTOMER_NUMBER}
      stream_maps:
        public-Alarm:
          source: ${CUSTOMER_NUMBER}
  # Loaders
  loaders:
  # This is the default loader for the `target-duckdb` target
  - name: target-duckdb
    variant: jwills
    pip_url: target-duckdb~=0.6
    config:
      filepath: ./output/warehouse.duckdb
      default_target_schema: main
  # Transformers
  transformers:
  - name: dbt-duckdb
    variant: jwills
    pip_url: dbt-core~=1.6.0 dbt-duckdb~=1.6.0
    config:
      path: ./output/warehouse.duckdb
  # Utilities
  utilities:
  - name: superset
    variant: apache
    pip_url: apache-superset==3.0.0 duckdb-engine==0.9.2
    config:
      WTF_CSRF_ENABLED: False
jobs:
- name: all
  tasks:
  - tap-postgres--customer target-duckdb

Then in your orchestrator you'd set the env var

CUSTOMER_NUMBER=2

CUSTOMER_NUMBER=3

etc which gets you to what you're after. Now the only thing I"m not 100% sure of is if the nested object

stream_maps

would expand the

env

properly I know the

database

config would work properly

visch

10/10/2023, 2:33 PM

The other thing is do you care about incremental and state being stored properly if so this will need some tweaks

ellis_valentiner

10/10/2023, 2:37 PM

Thanks, this looks more like what I want.

visch

10/10/2023, 2:39 PM

Some folks say they are going to have "100"'s of clients etc, I tell everyone to just start with inherit from until you actually hit 5+ then swap it

ellis_valentiner

10/10/2023, 2:52 PM

For orchestration, would you recommend using the built-in Airflow? We've had better luck using other tooling.

visch

10/10/2023, 2:53 PM

whatever tool your team currently uses is what I'd recommend!

visch

10/10/2023, 2:53 PM

otherwise github actions, gitlab, cron, etc

3 Views

Open in Slack

Previous Next