Hi everyone, I wanted to drop a learning here that...
# troubleshooting
j
Hi everyone, I wanted to drop a learning here that had us confused earlier this week: we use logical replication with wal2json and pipelinewise-tap-postgres to replicate an RDS instance into Snowflake and saw text columns disappearing in Snowflake as soon as the row was updated in the RDS instance. It turned out that wal2json doesn’t always include large(r) text blobs in WAL. When the wal2json message is processed by singer / the meltano pipeline, it sets the column to NULL instead of ignoring updating this. For anyone that wants to read more about this: https://github.com/eulerto/wal2json/pull/99 and https://www.postgresql.org/docs/9.5/storage-toast.html.
j
Hey Jobert, thanks a lot for the message. I just debugged our issue using your post. Have you fixed this somehow? Only thing I think of now is to switch to incremental load since the issue on wal2json is not address for like 3 years.
j
@jozef_reginac Glad we’re not the only ones that ran into this 😅 @martijn_wouters and @maarten_van_gijssel can speak to what we did to solve it. I do know that we’re running a fork of the tap right now, but they know all the details. @maarten_van_gijssel is currently on PTO, so perhaps @martijn_wouters can pitch in.
m
@jozef_reginac we’ve experimented with a couple of different changes. Currently it’s a change in the target https://github.com/transferwise/pipelinewise-target-snowflake/compare/master...Hacker0x01:master
The idea behind the change is that we create separate “buckets” for the same resource based on the different possible schemas.
tap-postgres
is sending over records to
target-snowflake
where sometimes fields are missing due to the TOAST mechanism
We generate possible schemas based on the different sets of fields within the same resource
This works only reliably using the Parquet file format. Using the csv file format will result in errors that the generated csv will have less columns than the target table
Let me know if that makes sense! Happy to explain in more detail if needed 🙂