Hey y'all, suspect I'm doing something wrong here:...
# troubleshooting
j
Hey y'all, suspect I'm doing something wrong here: I have N upstream Postgres databases and would like to ETL their outputs into N different schemas in Snowflake. I thought I could use the
inherit_from
trick for each of the extractors/loaders to share some common base config in a single
meltano.yml
file, but that seemed to blow up the size of my install/docker container because Meltano seems to be installing N different copies of the dependent libraries for each of the
inherit_from
extractors/loaders. Am I doing this wrong? Should I just have a single
tap-postgres
extractor and then run N different meltano jobs for each of the different DBs I need it to do extraction from? And how should I dynamically adjust the target schema in Snowflake for each one-- is it like setting the
TARGET_SNOWFLAKE_DEFAULT_TARGET_SCHEMA
differently for each job?
a
@josh_wills You're doing all the things right. This issue describes our proposed improvement to reuse the install rather than repeat the install per each
inherit_from
instance.
Should I just have a single 
tap-postgres
 extractor and then run N different meltano jobs for each of the different DBs I need it to do extraction from?
You could certainly user this approach (as a workaround waiting for the above), especially if you need to prioritize image size and/or install time.
And how should I dynamically adjust the target schema in Snowflake for each one-- is it like setting the 
TARGET_SNOWFLAKE_DEFAULT_TARGET_SCHEMA
 differently for each job?
Yes, this should work for your use case. In the long-run, we plan to reuse install directories, but overriding the schema target in environment variables would give you a solid workaround.
(Shameless plug 😉:) If you've got cycles to contribute, we also are accepting Merge Requests on the above issue. This is an issue affecting others as well, so it would be a big win (for image size and install time) to deliver it. 🙂
e
I do what you've described and it looks like this:
Copy code
- name: target-snowflake
    variant: transferwise
    config:
      account: my_account.us-east-1
      user: my_username
      snowflake_role: LOADER
      dbname: RAW
      warehouse: LOADING
      file_format: RAW.CB_CORE.CSV
      default_target_schema: NO_MANS_LAND
      logging_level: DEBUG
      batch_size_rows: 1000000
      add_metadata_columns: true
      primary_key_required: false
      query_tag: meltano-loader-schema-table
  - name: target-snowflake-cb_chatlog
    inherit_from: target-snowflake
    config:
      default_target_schema: CB_CHATLOG
  - name: target-snowflake-cb_photos
    inherit_from: target-snowflake
    config:
      default_target_schema: CB_PHOTOS
We set the default to NO_MANS_LAND to indicate that it isn't to be used.
j
Wanted to close the loop on this one: I ended up implementing my default postgres tap and then writing a shell script to update/override the environment variables that needed to change for the different dbs and then calling
meltano elt
for the tap/target combo; it was an easier hack than either a) blowing up the size of my container by using lots of
inherit_from
blocks or b) fixing Meltano to not duplicate the install of the taps based on the pip_url (sounds like a hella handy feature that I sadly do not have the time to implement at the moment!)