adrian_soltesz
07/21/2023, 1:12 PMtap-postgres
(pipelinewise version) and target-redshift
(also the pipelinewise version). For fresh data, we run our pipeline often, usually every 15 mins. As some of the tables are very big, we use incnremental replications.
The problem is that sometimes we got mismatched row numbers compared to the source, like let’s say we have 4999876
row in postgres source, and in redshift target only 4999870
or something like that. Probably these happen when a replication fails halfway for some reason, like some connection issue for example.
When we encounter these problems, we usually have to trigger a full replication by deleting some data from the system database.
Is there any way to ensure these row count mismatches don’t happen using incremental replication, even when an ELT run fails? Or at least a way to detect this automatically? Currently we do it manually, which is not ideal, as an unnoticed missing source row could result in erroneous data in our marts. 😕pat_nadolny
07/21/2023, 1:52 PMhuiming
07/25/2023, 9:44 AM