Hi everyone! :waving-wave: We have some issues in ...
# troubleshooting
a
Hi everyone! waving wave We have some issues in our pipeline, and I hope someone encountered and solved this before. We’re using
tap-postgres
(pipelinewise version) and
target-redshift
(also the pipelinewise version). For fresh data, we run our pipeline often, usually every 15 mins. As some of the tables are very big, we use incnremental replications. The problem is that sometimes we got mismatched row numbers compared to the source, like let’s say we have
4999876
row in postgres source, and in redshift target only
4999870
or something like that. Probably these happen when a replication fails halfway for some reason, like some connection issue for example. When we encounter these problems, we usually have to trigger a full replication by deleting some data from the system database. Is there any way to ensure these row count mismatches don’t happen using incremental replication, even when an ELT run fails? Or at least a way to detect this automatically? Currently we do it manually, which is not ideal, as an unnoticed missing source row could result in erroneous data in our marts. 😕
p
@adrian_soltesz this sounds like its likely a bug 😕. Singer targets are supposed to only emit their incremental state once theyre completely confident that all the data in a batch has been loaded to the destination, this means that a failure halfway through a sync should be resilient and not cause data loss.
h
@pat_nadolny is there a way we can identify whether the bug is in the Singer tap or target?