Hi all, quick question following up on some of the...
# best-practices
a
Hi all, quick question following up on some of the work above: • targets appear to be tightly coupled to config or ENV vers • If we plan to re-use targets, is there a recommended approach here? eg:
Copy code
pipeline1: Pulling from SalesForce and loading via target-s3-csv to an s3 bucket with account role ARN:123 `salesforce-data`
pipeline2: Pulling from Jira and loading via target-s3-csv to an s3 bucket with account role ARN:456 `salesforce-data`
If dockerizing this, i’m under the impression that we should have a single
meltano
container with a bunch of workers which are used for actual processing. TLDR: whats the recommended approach for re-using targets with different configs (writing to the same location is not feasible for us so we cannot consider this option unfortunately)
b
For re-use you can use
inherit_from:
works for tap/targets you only have to change the configuration. Something like this.
Copy code
- name: target-redshift
    variant: transferwise
    pip_url: pipelinewise-target-redshift
    config:
      primary_key_required: false
      batch_size_rows: 100000
  - name: target-redshift-raw
    inherit_from: target-redshift
    config:
      primary_key_required: true
      batch_size_rows: 250000
      parallelism: -1
a
Legend!
e
@boggdan_barrientos's solution is probably the more robust one, especially if the config values will be hardcoded once the Docker image is baked. Another approach, that database loaders use, is to default the value of the target schema/bucket/etc. to
$MELTANO_EXTRACT__LOAD_SCHEMA
(for example target-postgres). That variable is filled at runtime from the extractor definition: https://meltano.com/docs/plugins.html#load-schema-extra. So if you have an extractor for Salesforce with a namespace
salesforce
, Jira with namespace
jira
and a target-s3-csv that defines
s3_bucket: $MELTANO_EXTRACT__LOAD_SCHEMA
, then each source will land in a separate bucket in S3