Hey everyone! I was wondering if anyone had tips ...
# troubleshooting
g
Hey everyone! I was wondering if anyone had tips for speeding up Meltano pipelines? I've been running into a constant issue of pipelines moving very slow and in some cases not being able to keep up for each days collection.
d
@gunnar What tap and target are you using? What are the hardware specifications (CPU, RAM) on the machine/instance?
g
• Tap where the issue comes up the most: tap-klaviyo • Target: target-redshift • Instance: Linux EC2, vCPUs: 4, RAM: 16
d
@aaronsteers @edgar_ramirez_mondragon Can either of you please weigh in here?
e
Hi @gunnar! You mean the extract-load is on a daily schedule and the execution takes over a day to run?
g
Yes exactly. It seems as though it is taking much longer than it should. It is pulling a lot of data(~5-10 million rows), but as an example: A pipeline using the klaviyo tap (only collecting from 1 stream) has been running for 1 day and 18 hours, all to collect 1 days worth of data. I was hoping to get any suggestions for improving performance/speed of the Meltano pipelines. Whether that would be increasing the hardware specifications on the Instance Meltano is running on, or if there were any further configurations.
a
Jumping in here....
@gunnar - to me, 5-10 million rows doesn't seem like should take nearly that long. My assumption is that the tap itself may be slow. Is it possible to first, try isolating to a single stream and second, see if you can get metrics of the tap writing directly to disk, for instance with target-jsonl?
e
It looks like the singer variant is already using the max page size of 100, so you may not be able to increase network throughput that way. Maybe batch size on the target size can be increased?
a
A couple other possible explanations would be network proximity (best to have things in the same region when possible) and slowness in the target. Since the Redshift tap is pretty mainstream, I'd lean towards an issue in the tap or regional/network slowness.
Cc @amanda.folson - possible "Performance Troubleshooting" docs topic? 😉
e
@gunnar sounds like an issue with the klaviyo tap (instead of the redshift target) ?
@aaronsteers would our hardware slow it down at all? We’re currently use a SQLite db and need to beef up our production environment In terms of reviewing the Klaviyo tap, where are best places to check in the code? @gunnar I think you already checked batch size for writing to Redshift, correct?
v
Hitting some performance things on an unrelated tap (tap-oracle) my problem is actually with my target (target-mssql , not as main stream). It's not the easiest to know where in the pipeline is your bottleneck. Mostly typing this up for @amanda.folson 🙂 1. Is your target getting behind your tap? (This "shouldn't happen" in most scenarios but if there was a way to confirm this it'd be great) 2. (If 1 is no) then it's your tap's issue. Next question is what portion of the TAP is going slow? If it's an HTTP based tap then network connectivity speed can have a decent impact. I've also seen a better CPU dramatically increase times due to my tap/target both using a single thread to serialize / deserialize data (Maybe the SingerSDK fixes this?). 2.1. How do you tell which portion of your tap is the bottle neck? Right now bench marking your tap is the best way to do this, but it's fairly technical. Seems like a great place for tooling or for meltano to hook into the best tools available today
e
@gunnar let's try these steps
a
Thanks for the write-up! I don't have answers for these but we definitely should