Hi everyone waving wave We have some issues in our pipeline Meltano #troubleshooting

Hi everyone! :waving-wave: We have some issues in ...

adrian_soltesz

07/21/2023, 1:12 PM

Hi everyone! waving wave We have some issues in our pipeline, and I hope someone encountered and solved this before. We’re using

tap-postgres

(pipelinewise version) and

target-redshift

(also the pipelinewise version). For fresh data, we run our pipeline often, usually every 15 mins. As some of the tables are very big, we use incnremental replications. The problem is that sometimes we got mismatched row numbers compared to the source, like let’s say we have

row in postgres source, and in redshift target only

or something like that. Probably these happen when a replication fails halfway for some reason, like some connection issue for example. When we encounter these problems, we usually have to trigger a full replication by deleting some data from the system database. Is there any way to ensure these row count mismatches don’t happen using incremental replication, even when an ELT run fails? Or at least a way to detect this automatically? Currently we do it manually, which is not ideal, as an unnoticed missing source row could result in erroneous data in our marts. 😕

pat_nadolny

07/21/2023, 1:52 PM

@adrian_soltesz this sounds like its likely a bug 😕. Singer targets are supposed to only emit their incremental state once theyre completely confident that all the data in a batch has been loaded to the destination, this means that a failure halfway through a sync should be resilient and not cause data loss.

huiming

07/25/2023, 9:44 AM

@pat_nadolny is there a way we can identify whether the bug is in the Singer tap or target?

Open in Slack

Previous Next