Anyone have any ideas on how to increase the perfo...
# troubleshooting
d
Anyone have any ideas on how to increase the performance of meltano? I understand that this would be rather tap specific, or could be, but would increasing the compute help? We are making many batch updates in our core postgres database (~7m rows updated) which is filling our WAL and we don’t want to continue these batch updates until that translog is reduced. Obviously meltano is ingesting/clearing it but that reading of the log could take a few hours. A lot of that time it is meltano working within this loop:
Copy code
[2022-12-30, 04:35:11 EST] 2022-12-30T09:35:11.424916Z [info     ] time=2022-12-30 09:35:11 name=tap_postgres level=INFO message=Lastest wal message received was 1809/B9460F18 cmd_type=extractor name=tap-postgres--production run_id=0c4e9a15-d237-4ee3-9ff2-066229e92c5e state_id=cmd-production stdio=stderr
Any ideas on how to speed that up, including raising compute would be appreciated (and save me a few hours of sleep 😂 )
j
I dont have an answer for. you, but are you reading from the WAL log directly like would be happening in normal Postgres replication just using melanto instead?
d
Yup using the postgres tap’s incremental log based replication
j
I am looking at that now as we seem to timeout a lot on key based.
But we, like you, have some really large tables
d
Usually its completely fine, but we’re updating a crapton of data at once
c
I might be wrong, but your use case seems like a potential candidate for the new "BATCH" message support in meltano. "BATCH" message is available out of the box for any tap and target built with the Meltano SDK. Switching from your current pipelinewise postgres tap to the meltanolabs postgres tap would mean that you won't be using LOG-based replication anymore though and all your tables would need to have a replication key set, if you want to replicate them incrementally.