Anyone have any ideas on how to increase the performance of Meltano #troubleshooting

Anyone have any ideas on how to increase the perfo...

devon_seitz

12/30/2022, 4:46 PM

Anyone have any ideas on how to increase the performance of meltano? I understand that this would be rather tap specific, or could be, but would increasing the compute help? We are making many batch updates in our core postgres database (~7m rows updated) which is filling our WAL and we don’t want to continue these batch updates until that translog is reduced. Obviously meltano is ingesting/clearing it but that reading of the log could take a few hours. A lot of that time it is meltano working within this loop:

Copy code

[2022-12-30, 04:35:11 EST] 2022-12-30T09:35:11.424916Z [info     ] time=2022-12-30 09:35:11 name=tap_postgres level=INFO message=Lastest wal message received was 1809/B9460F18 cmd_type=extractor name=tap-postgres--production run_id=0c4e9a15-d237-4ee3-9ff2-066229e92c5e state_id=cmd-production stdio=stderr

Any ideas on how to speed that up, including raising compute would be appreciated (and save me a few hours of sleep 😂 )

jaye_howell

12/30/2022, 4:56 PM

I dont have an answer for. you, but are you reading from the WAL log directly like would be happening in normal Postgres replication just using melanto instead?

devon_seitz

12/30/2022, 5:07 PM

Yup using the postgres tap’s incremental log based replication

jaye_howell

12/30/2022, 5:09 PM

I am looking at that now as we seem to timeout a lot on key based.

jaye_howell

12/30/2022, 5:09 PM

But we, like you, have some really large tables

devon_seitz

12/30/2022, 5:30 PM

Usually its completely fine, but we’re updating a crapton of data at once

christoph

01/02/2023, 9:36 PM

I might be wrong, but your use case seems like a potential candidate for the new "BATCH" message support in meltano. "BATCH" message is available out of the box for any tap and target built with the Meltano SDK. Switching from your current pipelinewise postgres tap to the meltanolabs postgres tap would mean that you won't be using LOG-based replication anymore though and all your tables would need to have a replication key set, if you want to replicate them incrementally.

5 Views

Open in Slack

Previous Next