Hello Meltano community! I'm currently having some...
# troubleshooting
r
Hello Meltano community! I'm currently having some issues where certain pipelines are taking 3 hours to run. I'm utilizing the
tap-spreadsheets-anywhere
and the
target-postgres
. I'm guessing it's more of a machine utilization issue, and I set up my GKE cluster to autoscale, but still am trying to narrow that down. Would anyone be able to help with tips in figuring out how to make these run faster, or is this just to be expected? Thank you!
v
Take a look at the prs for your target. I vaguely remember someone pushing up a performance improvement that I doubt has been merged!
d
@ricky_renner How many rows are we talking about here?
l
I think I was the one who pushed a PR about this.
@ricky_renner I actually did a PR for each side https://github.com/ets/tap-spreadsheets-anywhere/pull/18 and https://github.com/datamill-co/target-postgres/pull/204. They have not been merged yet, but they should definitely help out. For my use case, I dropped from 45-50 minutes down to about 7-8. If you have some extra benchmarking to share, don't hesitate to add it to the PRs, because I only tried on 1-2 pipelines.
d
Wow, that's quite some performance improvement! Nice work @laurent
v
@laurent awesome job!
r
Wow thanks @laurent!
ANd @douwe_maan it is 318521 rows
I think what happened was my GCP quota for CPUs was maxed out, so my GKE cluster wasn't autoscaling though. But still, this same pipeline took 30 minutes this morning