Hey y all suspect I m doing something wrong here I have N up Meltano #troubleshooting

Hey y'all, suspect I'm doing something wrong here:...

josh_wills

08/11/2021, 11:49 PM

Hey y'all, suspect I'm doing something wrong here: I have N upstream Postgres databases and would like to ETL their outputs into N different schemas in Snowflake. I thought I could use the

inherit_from

trick for each of the extractors/loaders to share some common base config in a single

meltano.yml

file, but that seemed to blow up the size of my install/docker container because Meltano seems to be installing N different copies of the dependent libraries for each of the

inherit_from

extractors/loaders. Am I doing this wrong? Should I just have a single

tap-postgres

extractor and then run N different meltano jobs for each of the different DBs I need it to do extraction from? And how should I dynamically adjust the target schema in Snowflake for each one-- is it like setting the

TARGET_SNOWFLAKE_DEFAULT_TARGET_SCHEMA

differently for each job?

aaronsteers

08/12/2021, 12:40 AM

@josh_wills You're doing all the things right. This issue describes our proposed improvement to reuse the install rather than repeat the install per each

inherit_from

instance.

aaronsteers

08/12/2021, 12:43 AM

Should I just have a single
tap-postgres
extractor and then run N different meltano jobs for each of the different DBs I need it to do extraction from?

You could certainly user this approach (as a workaround waiting for the above), especially if you need to prioritize image size and/or install time.

And how should I dynamically adjust the target schema in Snowflake for each one-- is it like setting the
TARGET_SNOWFLAKE_DEFAULT_TARGET_SCHEMA
differently for each job?

Yes, this should work for your use case. In the long-run, we plan to reuse install directories, but overriding the schema target in environment variables would give you a solid workaround.

aaronsteers

08/12/2021, 12:44 AM

(Shameless plug 😉:) If you've got cycles to contribute, we also are accepting Merge Requests on the above issue. This is an issue affecting others as well, so it would be a big win (for image size and install time) to deliver it. 🙂

edward_smith

08/12/2021, 2:37 PM

I do what you've described and it looks like this:

Copy code

- name: target-snowflake
    variant: transferwise
    config:
      account: my_account.us-east-1
      user: my_username
      snowflake_role: LOADER
      dbname: RAW
      warehouse: LOADING
      file_format: RAW.CB_CORE.CSV
      default_target_schema: NO_MANS_LAND
      logging_level: DEBUG
      batch_size_rows: 1000000
      add_metadata_columns: true
      primary_key_required: false
      query_tag: meltano-loader-schema-table
  - name: target-snowflake-cb_chatlog
    inherit_from: target-snowflake
    config:
      default_target_schema: CB_CHATLOG
  - name: target-snowflake-cb_photos
    inherit_from: target-snowflake
    config:
      default_target_schema: CB_PHOTOS

edward_smith

08/12/2021, 2:37 PM

We set the default to NO_MANS_LAND to indicate that it isn't to be used.

josh_wills

08/16/2021, 7:32 PM

Wanted to close the loop on this one: I ended up implementing my default postgres tap and then writing a shell script to update/override the environment variables that needed to change for the different dbs and then calling

meltano elt

for the tap/target combo; it was an easier hack than either a) blowing up the size of my container by using lots of

inherit_from

blocks or b) fixing Meltano to not duplicate the install of the taps based on the pip_url (sounds like a hella handy feature that I sadly do not have the time to implement at the moment!)

Open in Slack

Previous Next