Hi all, I wanted an opinion on designing taps: Sa...
# singer-tap-development
s
Hi all, I wanted an opinion on designing taps: Say, I am writing a tap-redshift plugin and I intend to use it with target-s3. In this case, instead of streaming data directly in JSON format, it is much more efficient to do an
UNLOAD
operation in Redshift in one go with the select query, which also supports parallelism and other operations. Similarly, if I want to write data to postgres, I could use
COPY
to be more efficient(Redshift->S3->Postgres). So when designing tap, there are optimisations possible depending on what target I choose with it. My question is, should we think about these concerns when developing a new plugin ?
v
Batch I think would be interesting to you https://sdk.meltano.com/en/latest/batch.html It's for utilizing those target specific "speed" enhancements designed for what you're talking about here
I think in your case if you implemented batch for
tap-redshift
and
target-postgres
you could get to a point that you could just run
meltano run tap-redshift target-postgres
and data wouldn't even flow to your python application
s
Oh, I didn't know batching could help with optimisations like these. Thanks 🙏 Need batching for increment sync!
v
@silverbullet1 take a peak at https://github.com/meltano/sdk/issues/976#issuecomment-1621538738 / tap-snowflake