Hi people! Is there a way to speed up the data sy...
# troubleshooting
s
Hi people! Is there a way to speed up the data sync ? I am currently testing the transfer from redshift to postgres (sdk connectors) and I am getting roughly 20-30k records per minute 😞 What all params can I tweak here? Thanks!
@pat_nadolny / @visch any idea on this? Sorry to tag individually 🙈
t
which redshift connector are you using? I’m not aware of one that’s SDK-based. If there was one you could use BATCH messages to really speed it up https://sdk.meltano.com/en/latest/batch.html Redshift is on our list to rebuild on the SDK in the coming months!
s
@taylor I am trying to develop the SDK variant. Let me checkout the batch messages, thanks 🙂 So its just the tap that needs optimisation, correct? Are there any other params which control the frequency at which we extract/load etc
u
It’s likely a bit of both. There’s probably something on both the tap and target that can be done to speed up processing. @edgar_ramirez_mondragon you know the SDK settings better than I do. Perhaps something around the number of records to hold before writing to the DB?
e
I don’t think there’s an out-of-the-box way to control the batch size in targets built with the SDK, but I logged https://github.com/meltano/sdk/issues/1626 a while ago. The
MAX_SIZE_DEFAULT
can be set by the target developer, but there’s no way currently to let the user control that.
s
@taylor looks like batching doesn’t currently work with incremental sync as per https://github.com/meltano/sdk/issues/976, so I would likely hold on to it
What is a good number of records which my tap should be able to extract in some timeframe(say 1 minute) ? I am noticing that on my local machine, my tap-redshift is able to copy ~ 50k records to target-jsonl and ~ 40k records to target-postgres. I am assuming that target side is ideal(assuming target-jsonl where it just needs to dump to file). Also, will these no.s be impacted if I run it on my a higher config system ? (more CPU and RAM)