Does anyone has trouble with the `pipelinewise` variant of ` Meltano #troubleshooting

Does anyone has trouble with the `pipelinewise` va...

ryan_andrei_perez

03/25/2024, 10:07 AM

Does anyone has trouble with the

pipelinewise

variant of

tap-postgres

suddenly not working with either one of the two WALs in a single database instance? There is data on the WAL but the pipeline just says

Waiting for first wal message

Raül Bonet

03/25/2024, 10:14 AM

Hello @ryan_andrei_perez. Whenever that happened to us it has always been than the Postgres instance was having (a lot) of trouble in allocating resources to deliver WAL messages, either because the database was busy or there were very big transactions and loading those requires a lot of resources. That got confirmed in the end by manually logging in to the Postgres instance and using the function

pg_logical_slot_peek_binary_changes()

. Even setting the

upto_lsn

to 1 lead to the function taking about 45 minutes to start showing content, when it is usually immediate. In our case, we believe a very big transaction was causing the problem in rendering some messages; once those messages were consumed everything went back to normal

ryan_andrei_perez

03/25/2024, 10:19 AM

Hi @Raül Bonet! Appreciate the response! Did you guys did something to the states (or config) or just waited? If it is the last one.. this one has been stuck on the

Waiting for first wal message

since last Friday

ryan_andrei_perez

03/25/2024, 10:19 AM

peeking at the changes tho seems an interesting approach to debug this further, i’ll check that!

Raül Bonet

03/25/2024, 10:24 AM

No problem, happy to try to save you the nightmare that was for us debugging that haha. The only thing we did was manually was unsetting some kind of config parameter that made the pipeline stop after some time.

max_run_seconds

I think it is. If not the pipeline would stop and start again and all the process was lost. Apart from that, yes, just wait, but in our case was maybe 1 hour maximum... Other things you can check: at the beginning of the execution, the pipeline says something like

requesting WAL messages up to LSN someting

. You can use the

pg_replication_slots

to see the

confirmed_flush

and what's the state of the slot. But yeah, definitely try peeking changes! If that also takes forever, the problem is in the database itself, not your pipeline

ryan_andrei_perez

03/25/2024, 11:46 AM

you’re a lifesaver man! In our case it was not

max_run_seconds

that was causing the pipeline to be cut off (it was already at 3600 seconds) but the

logical_poll_total_seconds

was set to 120 seconds, and it was also a case of a heavy transaction

👀 1

np 1

visch

03/25/2024, 12:44 PM

Great info here, just wanted to say that's amazing. @Raül Bonet would it be ok for me to capture this and throw it in the readme for meltanolab's

tap-postgres

I know it's not the same tap but I'm pretty sure we'd hit the same thing too!

visch

03/25/2024, 12:45 PM

I wonder if it's caused by the

wal2json

plugin, I have some consternation about that plugin myself but at the end of the day I don't want to run a different one 😕 so we used the same thing

Raül Bonet

03/30/2024, 5:20 PM

Hello @visch Sorry, I have been sick and not able to follow up on this! Yes please, I must say huge part of the merit goes to @maarten_van_gijssel (Github here) , my tech lead at HackerOne with whom we debugged this (in case you want to acknowledge somebody 😉). About `wal2json`: we had the same theory. The version of the tap that we are using does not leverage the

write-in-chunks

option either, which probably only makes things worse. I am very curious about your reservations and about why you don't want to run a different one. We were concerned about its efficiency in the decoding process; in the end, a binary format like

pgoutput

should be faster. Could you elaborate on your reasons?

👀 1

visch

04/04/2024, 1:28 PM

Created https://github.com/MeltanoLabs/tap-postgres/issues/402 to document this at some point in the tap readme. Thanks for the heads up and thank you @maarten_van_gijssel 😄 I think you're right @Raül Bonet that using

pgoutput

would be a better option for the tap. Reason we didn't go with it is because I couldn't find a python lib (quickly that is as I"m sure one exists) that would do the work for us to translate the binary format for us. wal2json did the job by getting us to a serialization format. I'm not a postgres expert when it comes to the wal. My limited personal experience here normally would have led me to just use Debezium but there's a lot of benefits of doing this on the singer side in Python. I knew this would be a deep knowledge dive so we went with something that'd get us up and running quickly

12 Views

Open in Slack

Previous Next