Seeing as only one Meltano instance can work on on...
# best-practices
r
Seeing as only one Meltano instance can work on one STATE_ID at a time, what are the best practices for speeding up a tap-postgres <> target-postgres
el
? I’ve got a table with 66 million rows that I would like to use Meltano to keep track of with CDC, but the initial replication takes way longer than Airbyte (8hrs vs. Airbyte’s 2.5hrs) which we are trying to move away from for a variety of reasons. Any help greatly appreciated!
a
Hi @Rhys Davies checkout
--state-id-suffix
https://docs.meltano.com/reference/command-line-interface/#parameters-3 I presume this state id is an attempt to run the sync of various tables in parallel? Other ideas to address the performance issues: • would LOG_BASED be an option in your variant (https://github.com/transferwise/pipelinewise-tap-postgres once caught up it should be able to keep up) • an tap -> target that leveraged BATCH would hugely improve the performance • anyone else got any ideas?
👍 1
e
Yeah, the most common way to make syncs more performant is by splitting each stream into its own pipeline. That said, I hope to add benchmarks to target-postgres in the near future and try to incrementally improve load performance.
r
Thanks for the responses. Yep, once CDC is caught up the syncs are pretty fast and all the tables in that particular database will sync in a few minutes or less. I was worried about something like a schema change for this large table, then having to do a --full-refresh and people being annoyed that they would have to wait for some time to get an update, but I have some ideas for that sort of scenario (I’ve also been promised that this table will not change 😛 - we will see)
🤞 1