HI team anyone encountered a case where 2 taps e g `tap sale Meltano #troubleshooting

HI team, anyone encountered a case where 2 taps (e...

benjamin_maquet

05/12/2021, 1:34 PM

HI team, anyone encountered a case where 2 taps (e.g.

tap-salesforce

and

tap-zendesk

) share the same

stream

name? How do you deal with such cases? The target would write to the same file/table wouldn’t it? We haven’t encountered it yet but thinking about the future and wondering what the community would do in such cases…? Thanks!

dan_ladd

05/12/2021, 1:38 PM

I my case, target-bigquery allows a table_prefix in the config, so our tables end up being

zendesk_users

and

salesforce_users

nick_hamlin

05/12/2021, 1:42 PM

I ran into this with redshift and addressed it by having the target put each data source in a separate schema. More details/context in this thread: https://meltano.slack.com/archives/CFG3C3C66/p1617224371207200?thread_ts=1617223616.203600&cid=CFG3C3C66

benjamin_maquet

05/12/2021, 1:49 PM

thanks! @dan_ladd so do you maintain one

target-bigquery

for each of your tap? or do you pass the

table_prefix

at runtime using Meltano? If the latter, how do you do it?

benjamin_maquet

05/12/2021, 2:05 PM

@nick_hamlin we also rely on plugin inheritance. We are building Meltano on docker and when we have a lot of targets, the docker image size becomes real big and it’s causing us some issues. So we are trying to reduce the number of targets we install 🙂 A nice solution would be the possibility to add/update a target config at runtime (in the same way that

--select

can be used to select an objects/stream, it would be nice to be able to specify a schema/table name at runtime, without needing to install multiple targets and relying on plugin inheritance…). @douwe_maan, would love your opinion on this!

dan_ladd

05/12/2021, 2:09 PM

The latter, we have our meltano project wrapped up in docker and run it for each job, passing in the prefix at runtime

benjamin_maquet

05/12/2021, 2:17 PM

How are you passing this prefix at runtime? Do you manually pass a

config.json

and have the prefix in there? AFAIK only few commands are allowed at runtime and they are listed here. Am I missing something? 😛

dan_ladd

05/12/2021, 3:01 PM

We pass it as an environment variable

TARGET_BIGQUERY_TABLE_PREFIX

to the image

benjamin_maquet

05/12/2021, 3:48 PM

Got it! I think we should be able to implement that as well. Thanks a lot!

chris_kings-lynne

05/13/2021, 2:41 AM

@benjamin_maquet we use Chamber which lets us easily keep all taps and targets in one docker image, but then hydrate specific env vars for each elt run, including overriding the schema name if we want

chris_kings-lynne

05/13/2021, 3:41 AM

If you’re wondering how that works, you set up parameter store like this:

chris_kings-lynne

05/13/2021, 3:41 AM

And then execute your container like this:

chris_kings-lynne

05/13/2021, 3:42 AM

chamber exec meltano meltano/target-redshift meltano/tap-zendesk -— meltano elt tap-zendesk target-redshift --job_id zendesk-to-redshift

chris_kings-lynne

05/13/2021, 3:42 AM

so that will hydrate all the

meltano/

vars first, followed by

meltano/target-redshift

vars next and then

meltano/tap-zendesk

last (this allows you to override target vars per-tap). Chamber doesn’t recurse downwards

chris_kings-lynne

05/13/2021, 3:43 AM

then you build a docker image that’s agnostic to anything and simply controlled by your container scheduler + chamebr

chris_kings-lynne

05/13/2021, 3:43 AM

FYI chamber https://github.com/segmentio/chamber

chris_kings-lynne

05/13/2021, 3:44 AM

oh and it’s super easy to just set the vars from your own dev machine:

chamber write meltano/tap-zendesk TARGET_REDSHIFT_BATCH_SIZE_ROWS 500

chris_kings-lynne

05/13/2021, 3:48 AM

Make sure you set up a separate RDS postgres for persistence and you’re in a world of such pure containers it’ll make you cry 🙂

Open in Slack

Previous Next