ryan_andrei_perez
03/25/2024, 10:07 AMpipelinewise
variant of tap-postgres
suddenly not working with either one of the two WALs in a single database instance? There is data on the WAL but the pipeline just says Waiting for first wal message
Raül Bonet
03/25/2024, 10:14 AMpg_logical_slot_peek_binary_changes()
. Even setting the upto_lsn
to 1 lead to the function taking about 45 minutes to start showing content, when it is usually immediate.
In our case, we believe a very big transaction was causing the problem in rendering some messages; once those messages were consumed everything went back to normalryan_andrei_perez
03/25/2024, 10:19 AMWaiting for first wal message
since last Fridayryan_andrei_perez
03/25/2024, 10:19 AMRaül Bonet
03/25/2024, 10:24 AMmax_run_seconds
I think it is. If not the pipeline would stop and start again and all the process was lost.
Apart from that, yes, just wait, but in our case was maybe 1 hour maximum...
Other things you can check: at the beginning of the execution, the pipeline says something like requesting WAL messages up to LSN someting
. You can use the pg_replication_slots
to see the confirmed_flush
and what's the state of the slot.
But yeah, definitely try peeking changes! If that also takes forever, the problem is in the database itself, not your pipelineryan_andrei_perez
03/25/2024, 11:46 AMmax_run_seconds
that was causing the pipeline to be cut off (it was already at 3600 seconds) but the logical_poll_total_seconds
was set to 120 seconds, and it was also a case of a heavy transactionvisch
03/25/2024, 12:44 PMtap-postgres
I know it's not the same tap but I'm pretty sure we'd hit the same thing too!visch
03/25/2024, 12:45 PMwal2json
plugin, I have some consternation about that plugin myself but at the end of the day I don't want to run a different one 😕 so we used the same thingRaül Bonet
03/30/2024, 5:20 PMwrite-in-chunks
option either, which probably only makes things worse.
I am very curious about your reservations and about why you don't want to run a different one. We were concerned about its efficiency in the decoding process; in the end, a binary format like pgoutput
should be faster. Could you elaborate on your reasons?visch
04/04/2024, 1:28 PMpgoutput
would be a better option for the tap. Reason we didn't go with it is because I couldn't find a python lib (quickly that is as I"m sure one exists) that would do the work for us to translate the binary format for us. wal2json did the job by getting us to a serialization format.
I'm not a postgres expert when it comes to the wal. My limited personal experience here normally would have led me to just use Debezium but there's a lot of benefits of doing this on the singer side in Python. I knew this would be a deep knowledge dive so we went with something that'd get us up and running quickly