Hi all I wanted an opinion on designing taps Say I am writin Meltano #singer-tap-development

Hi all, I wanted an opinion on designing taps: Sa...

silverbullet1

07/14/2023, 9:03 PM

Hi all, I wanted an opinion on designing taps: Say, I am writing a tap-redshift plugin and I intend to use it with target-s3. In this case, instead of streaming data directly in JSON format, it is much more efficient to do an

UNLOAD

operation in Redshift in one go with the select query, which also supports parallelism and other operations. Similarly, if I want to write data to postgres, I could use

COPY

to be more efficient(Redshift->S3->Postgres). So when designing tap, there are optimisations possible depending on what target I choose with it. My question is, should we think about these concerns when developing a new plugin ?

visch

07/14/2023, 9:07 PM

Batch I think would be interesting to you https://sdk.meltano.com/en/latest/batch.html It's for utilizing those target specific "speed" enhancements designed for what you're talking about here

visch

07/14/2023, 9:08 PM

I think in your case if you implemented batch for

tap-redshift

and

target-postgres

you could get to a point that you could just run

meltano run tap-redshift target-postgres

and data wouldn't even flow to your python application

silverbullet1

07/14/2023, 9:16 PM

Oh, I didn't know batching could help with optimisations like these. Thanks 🙏 Need batching for increment sync!

visch

07/14/2023, 9:17 PM

@silverbullet1 take a peak at https://github.com/meltano/sdk/issues/976#issuecomment-1621538738 / tap-snowflake

Open in Slack

Previous Next