Hello Meltano community I m currently having some issues whe Meltano #troubleshooting

Hello Meltano community! I'm currently having some...

ricky_renner

04/23/2021, 1:52 AM

Hello Meltano community! I'm currently having some issues where certain pipelines are taking 3 hours to run. I'm utilizing the

tap-spreadsheets-anywhere

and the

target-postgres

. I'm guessing it's more of a machine utilization issue, and I set up my GKE cluster to autoscale, but still am trying to narrow that down. Would anyone be able to help with tips in figuring out how to make these run faster, or is this just to be expected? Thank you!

visch

04/23/2021, 1:57 AM

Take a look at the prs for your target. I vaguely remember someone pushing up a performance improvement that I doubt has been merged!

douwe_maan

04/23/2021, 3:17 PM

@ricky_renner How many rows are we talking about here?

laurent

04/23/2021, 3:36 PM

I think I was the one who pushed a PR about this.

laurent

04/23/2021, 3:39 PM

@ricky_renner I actually did a PR for each side https://github.com/ets/tap-spreadsheets-anywhere/pull/18 and https://github.com/datamill-co/target-postgres/pull/204. They have not been merged yet, but they should definitely help out. For my use case, I dropped from 45-50 minutes down to about 7-8. If you have some extra benchmarking to share, don't hesitate to add it to the PRs, because I only tried on 1-2 pipelines.

douwe_maan

04/23/2021, 3:44 PM

Wow, that's quite some performance improvement! Nice work @laurent

visch

04/23/2021, 3:51 PM

@laurent awesome job!

ricky_renner

04/23/2021, 4:17 PM

Wow thanks @laurent!

ricky_renner

04/23/2021, 4:17 PM

ANd @douwe_maan it is 318521 rows

ricky_renner

04/23/2021, 4:18 PM

I think what happened was my GCP quota for CPUs was maxed out, so my GKE cluster wasn't autoscaling though. But still, this same pipeline took 30 minutes this morning

2 Views

Open in Slack

Previous Next