What is best way to increase rate of target API ca...
# troubleshooting
e
What is best way to increase rate of target API calls? Klaviyo allows 300+ API calls / second and we are trying to optimize this tap. Currently our meltano integration is running 1/10 the speed of Stitch
a
To confirm, is this using a new tap-klavio built on the SDK? Also, when using stitch, were you using the "v1" connector described here? I checked the Singer github repo for their tap and I don't see any extra parallelization or async handling. We specifically logged this issue because of performance limitations mentioned as related to the Klavio API itself - but I don't have any explanation for why you'd get better performance from the Singer/Stitch version the tap, since it doesn't do any parallel or asyn work I could find. Also - assuming this is SDK-based, you might get some boost from the latest version which had some performance improvements. If you can confirm the SDK version number is the latest and some other specifics around the endpoint being synced, it would be much appreciatd.
e
Thanks for the help as always @aaronsteers I'm not sure if it is build off the SDK -- will check it out We're using this klaviyo tap which: https://github.com/singer-io/tap-klaviyo It is the same as the "v1" you mentioned Stitch's klaviyo tap is technically community maintained so I don't think it's a special proprietary one of theirs
a
Interesting... yeah, then I'd expect performance to be the same unless Stitch has a "v2" that's not public.
e
there's a drastic difference which makes me think we're doing something obvious wrong what would be some key info we could provide to help troubleshoot? we're currently organizing logs from comparable periods and records other integrations are very comparable with stitch
a
If you still have access to the Stitch environment, I think it would be helpful to have the timing data both ways, with record counts, from one isolated stream - preferably a full table sync so we don't have to worry about differing saved state and starting points.
e
yes we do! @tom_mcgrail and @kai_yokoyama -- let's sync on this so we can organize for the meltano team
a
Also helpful to know how you are landing the data in order to ensure the target isn't holding up the process.
The cleanest for tap performance will always be to use
meltano invoke
and push STDOUT straight to a local file - but that may be overkill.
The other thing that comes to mind is the network connection strength of the runner. Since stitch is obvioously using a runner in AWS or equivalent, it will be less subject to network "roundtrip" latencies than something running on a local laptop in a home network, for instance.
e
we are using target-redshift from datamill https://github.com/datamill-co/target-redshift thanks for the tip re:
meltano invoke
, we'll compare speeds when outputting to a local file
we are currently hosting meltano on a linux ec2 (AWS)
Hey @aaronsteers we're working on the above but quick question: Below is process output of linux box w/ meltano I highlighted the job for klaviyo and looks like the 'load' process takes much more time and memory than the 'extract' (or maybe i'm misinterpreting the 'l' and 'e') Is this normal?
two highlighted columns on left are VSZ and RSS respectively highlighted column on right is time
e
The
datamill
variant denests everything. That usually consumes time and a fair amount of memory
e
got it -- even if our schema is only a limited number of columns? we noticed that stich is denesting 'receive' more than datamill but could be because of some setting here are pics
Meltano:
(only one receive table)
message has been deleted
Stitch
@edgar_ramirez_mondragon thanks for that explanation!! It explains why the loader is taking much more time We are beginning to build our own redshift target (which we'd make available to the community) Digging into datamill and pipelinewise which are the two redshift targets we're using now
a
@edgar_ramirez_mondragon's explanation about denesting may also explain some of the slowness and the additional memory consumption seen on the target side.