Hey everyone I was wondering if anyone had tips for speeding Meltano #troubleshooting

Hey everyone! I was wondering if anyone had tips ...

gunnar

07/30/2021, 5:24 PM

Hey everyone! I was wondering if anyone had tips for speeding up Meltano pipelines? I've been running into a constant issue of pipelines moving very slow and in some cases not being able to keep up for each days collection.

douwe_maan

07/30/2021, 5:28 PM

@gunnar What tap and target are you using? What are the hardware specifications (CPU, RAM) on the machine/instance?

gunnar

07/30/2021, 5:32 PM

• Tap where the issue comes up the most: tap-klaviyo • Target: target-redshift • Instance: Linux EC2, vCPUs: 4, RAM: 16

douwe_maan

07/30/2021, 5:42 PM

@aaronsteers @edgar_ramirez_mondragon Can either of you please weigh in here?

edgar_ramirez_mondragon

07/30/2021, 5:46 PM

Hi @gunnar! You mean the extract-load is on a daily schedule and the execution takes over a day to run?

gunnar

07/30/2021, 5:51 PM

Yes exactly. It seems as though it is taking much longer than it should. It is pulling a lot of data(~5-10 million rows), but as an example: A pipeline using the klaviyo tap (only collecting from 1 stream) has been running for 1 day and 18 hours, all to collect 1 days worth of data. I was hoping to get any suggestions for improving performance/speed of the Meltano pipelines. Whether that would be increasing the hardware specifications on the Instance Meltano is running on, or if there were any further configurations.

aaronsteers

07/30/2021, 6:14 PM

Jumping in here....

aaronsteers

07/30/2021, 6:16 PM

@gunnar - to me, 5-10 million rows doesn't seem like should take nearly that long. My assumption is that the tap itself may be slow. Is it possible to first, try isolating to a single stream and second, see if you can get metrics of the tap writing directly to disk, for instance with target-jsonl?

edgar_ramirez_mondragon

07/30/2021, 6:17 PM

It looks like the singer variant is already using the max page size of 100, so you may not be able to increase network throughput that way. Maybe batch size on the target size can be increased?

aaronsteers

07/30/2021, 6:17 PM

A couple other possible explanations would be network proximity (best to have things in the same region when possible) and slowness in the target. Since the Redshift tap is pretty mainstream, I'd lean towards an issue in the tap or regional/network slowness.

aaronsteers

07/30/2021, 6:19 PM

Cc @amanda.folson - possible "Performance Troubleshooting" docs topic? 😉

edward_ryan

07/30/2021, 7:14 PM

@gunnar sounds like an issue with the klaviyo tap (instead of the redshift target) ?

edward_ryan

07/31/2021, 10:40 AM

@aaronsteers would our hardware slow it down at all? We’re currently use a SQLite db and need to beef up our production environment In terms of reviewing the Klaviyo tap, where are best places to check in the code? @gunnar I think you already checked batch size for writing to Redshift, correct?

visch

08/02/2021, 12:24 PM

Hitting some performance things on an unrelated tap (tap-oracle) my problem is actually with my target (target-mssql , not as main stream). It's not the easiest to know where in the pipeline is your bottleneck. Mostly typing this up for @amanda.folson 🙂 1. Is your target getting behind your tap? (This "shouldn't happen" in most scenarios but if there was a way to confirm this it'd be great) 2. (If 1 is no) then it's your tap's issue. Next question is what portion of the TAP is going slow? If it's an HTTP based tap then network connectivity speed can have a decent impact. I've also seen a better CPU dramatically increase times due to my tap/target both using a single thread to serialize / deserialize data (Maybe the SingerSDK fixes this?). 2.1. How do you tell which portion of your tap is the bottle neck? Right now bench marking your tap is the best way to do this, but it's fairly technical. Seems like a great place for tooling or for meltano to hook into the best tools available today

edward_ryan

08/02/2021, 2:21 PM

@gunnar let's try these steps

amanda.folson

08/02/2021, 3:22 PM

Thanks for the write-up! I don't have answers for these but we definitely should

Open in Slack

Previous Next