Hi, I’ve been seeing occasional rows missed when d...
# troubleshooting
d
Hi, I’ve been seeing occasional rows missed when doing an
INCREMENTAL
sync using the transferwise postgres tap. Does anyone have a good solution for this problem? Here is an example scenario:
Before meltano runs:
replication_key_value = t0
After meltano runs:
replication_key_value = t2
When meltano runs it selects everything newer than its bookmark timestamp (
t0
). Since
tx1
is still running, it only sees
tx2
.
tx1
finishes, but since it wasn’t committed when meltano ran its select query, it won’t be seen in this meltano run. It’s timestamp will be
t1
When meltano finishes it updates its bookmark using the latest value it’s seen, which is
t2
in this case. The next time meltano runs, it’ll select everything newer that it’s boomark timestamp (
t2
).
tx1
will never be replicated.
The only solution I’ve come up with is setting the bookmark back by something like an hour before each run
e
Yeah, I can't think of a better solution and it's come up a few times from folks here. I don't think the "-1 hour" solution can be generalized to non-datetime replication keys so I'm curious if there's another way to do it.
👍 1