Hi people Is there a way to speed up the data sync I am curr Meltano #troubleshooting

Hi people! Is there a way to speed up the data sy...

silverbullet1

06/29/2023, 9:47 PM

Hi people! Is there a way to speed up the data sync ? I am currently testing the transfer from redshift to postgres (sdk connectors) and I am getting roughly 20-30k records per minute 😞 What all params can I tweak here? Thanks!

silverbullet1

06/30/2023, 5:08 PM

@pat_nadolny / @visch any idea on this? Sorry to tag individually 🙈

taylor

07/03/2023, 4:44 PM

which redshift connector are you using? I’m not aware of one that’s SDK-based. If there was one you could use BATCH messages to really speed it up https://sdk.meltano.com/en/latest/batch.html Redshift is on our list to rebuild on the SDK in the coming months!

silverbullet1

07/03/2023, 4:52 PM

@taylor I am trying to develop the SDK variant. Let me checkout the batch messages, thanks 🙂 So its just the tap that needs optimisation, correct? Are there any other params which control the frequency at which we extract/load etc

user

07/03/2023, 5:29 PM

It’s likely a bit of both. There’s probably something on both the tap and target that can be done to speed up processing. @edgar_ramirez_mondragon you know the SDK settings better than I do. Perhaps something around the number of records to hold before writing to the DB?

edgar_ramirez_mondragon

07/03/2023, 7:24 PM

I don’t think there’s an out-of-the-box way to control the batch size in targets built with the SDK, but I logged https://github.com/meltano/sdk/issues/1626 a while ago. The

MAX_SIZE_DEFAULT

can be set by the target developer, but there’s no way currently to let the user control that.

silverbullet1

07/04/2023, 7:51 AM

@taylor looks like batching doesn’t currently work with incremental sync as per https://github.com/meltano/sdk/issues/976, so I would likely hold on to it

silverbullet1

07/06/2023, 1:15 PM

What is a good number of records which my tap should be able to extract in some timeframe(say 1 minute) ? I am noticing that on my local machine, my tap-redshift is able to copy ~ 50k records to target-jsonl and ~ 40k records to target-postgres. I am assuming that target side is ideal(assuming target-jsonl where it just needs to dump to file). Also, will these no.s be impacted if I run it on my a higher config system ? (more CPU and RAM)

4 Views

Open in Slack

Previous Next