What is best way to increase rate of target API calls Klaviy Meltano #troubleshooting

What is best way to increase rate of target API ca...

edward_ryan

09/09/2021, 6:32 PM

What is best way to increase rate of target API calls? Klaviyo allows 300+ API calls / second and we are trying to optimize this tap. Currently our meltano integration is running 1/10 the speed of Stitch

aaronsteers

09/09/2021, 6:42 PM

To confirm, is this using a new tap-klavio built on the SDK? Also, when using stitch, were you using the "v1" connector described here? I checked the Singer github repo for their tap and I don't see any extra parallelization or async handling. We specifically logged this issue because of performance limitations mentioned as related to the Klavio API itself - but I don't have any explanation for why you'd get better performance from the Singer/Stitch version the tap, since it doesn't do any parallel or asyn work I could find. Also - assuming this is SDK-based, you might get some boost from the latest version which had some performance improvements. If you can confirm the SDK version number is the latest and some other specifics around the endpoint being synced, it would be much appreciatd.

edward_ryan

09/09/2021, 6:45 PM

Thanks for the help as always @aaronsteers I'm not sure if it is build off the SDK -- will check it out We're using this klaviyo tap which: https://github.com/singer-io/tap-klaviyo It is the same as the "v1" you mentioned Stitch's klaviyo tap is technically community maintained so I don't think it's a special proprietary one of theirs

aaronsteers

09/09/2021, 6:50 PM

Interesting... yeah, then I'd expect performance to be the same unless Stitch has a "v2" that's not public.

edward_ryan

09/09/2021, 6:51 PM

there's a drastic difference which makes me think we're doing something obvious wrong what would be some key info we could provide to help troubleshoot? we're currently organizing logs from comparable periods and records other integrations are very comparable with stitch

aaronsteers

09/09/2021, 6:56 PM

If you still have access to the Stitch environment, I think it would be helpful to have the timing data both ways, with record counts, from one isolated stream - preferably a full table sync so we don't have to worry about differing saved state and starting points.

edward_ryan

09/09/2021, 6:56 PM

yes we do! @tom_mcgrail and @kai_yokoyama -- let's sync on this so we can organize for the meltano team

aaronsteers

09/09/2021, 6:57 PM

Also helpful to know how you are landing the data in order to ensure the target isn't holding up the process.

aaronsteers

09/09/2021, 6:58 PM

The cleanest for tap performance will always be to use

meltano invoke

and push STDOUT straight to a local file - but that may be overkill.

aaronsteers

09/09/2021, 6:59 PM

The other thing that comes to mind is the network connection strength of the runner. Since stitch is obvioously using a runner in AWS or equivalent, it will be less subject to network "roundtrip" latencies than something running on a local laptop in a home network, for instance.

edward_ryan

09/09/2021, 7:14 PM

we are using target-redshift from datamill https://github.com/datamill-co/target-redshift thanks for the tip re:

meltano invoke

, we'll compare speeds when outputting to a local file

edward_ryan

09/09/2021, 7:14 PM

we are currently hosting meltano on a linux ec2 (AWS)

edward_ryan

09/09/2021, 9:44 PM

Hey @aaronsteers we're working on the above but quick question: Below is process output of linux box w/ meltano I highlighted the job for klaviyo and looks like the 'load' process takes much more time and memory than the 'extract' (or maybe i'm misinterpreting the 'l' and 'e') Is this normal?

edward_ryan

09/09/2021, 9:44 PM

two highlighted columns on left are VSZ and RSS respectively highlighted column on right is time

edgar_ramirez_mondragon

09/09/2021, 9:55 PM

The

datamill

variant denests everything. That usually consumes time and a fair amount of memory

edward_ryan

09/09/2021, 9:57 PM

got it -- even if our schema is only a limited number of columns? we noticed that stich is denesting 'receive' more than datamill but could be because of some setting here are pics

edward_ryan

09/09/2021, 9:58 PM

Meltano:

edward_ryan

09/09/2021, 9:58 PM

(only one receive table)

edward_ryan

09/09/2021, 9:58 PM

message has been deleted

edward_ryan

09/09/2021, 9:58 PM

Stitch

edward_ryan

09/09/2021, 9:59 PM

@edgar_ramirez_mondragon thanks for that explanation!! It explains why the loader is taking much more time We are beginning to build our own redshift target (which we'd make available to the community) Digging into datamill and pipelinewise which are the two redshift targets we're using now

aaronsteers

09/09/2021, 10:11 PM

@edgar_ramirez_mondragon's explanation about denesting may also explain some of the slowness and the additional memory consumption seen on the target side.

Open in Slack

Previous Next