Meltano

image.png

Hi, I’ve been seeing occasional rows missed when doing an `INCREMENTAL` sync using the transferwise postgres tap. Does anyone have a good solution for this problem?

Here is an example scenario:

Before meltano runs: `replication_key_value = t0`
After meltano runs: `replication_key_value = t2`

When meltano runs it selects everything newer than its bookmark timestamp (`t0`). Since `tx1` is still running, it only sees `tx2`.

`tx1` finishes, but since it wasn’t committed when meltano ran its select query, it won’t be seen in this meltano run. It’s timestamp will be `t1`

When meltano finishes it updates its bookmark using the latest value it’s seen, which is `t2` in this case.

The next time meltano runs, it’ll select everything newer that it’s boomark timestamp (`t2`).

`tx1` will never be replicated.

The only solution I’ve come up with is setting the bookmark back by something like an hour before each run

Yeah, I can't think of a better solution and it's come up a few times from folks here. I don't think the "-1 hour" solution can be generalized to non-datetime replication keys so I'm curious if there's another way to do it.