Hello,
While reading the SDK code, I couldn’t really understand how it would perform with a stream of un-sorted records with a replication key.
https://gitlab.com/meltano/singer-sdk/-/blob/main/singer_sdk/helpers/_state.py#L178
Here, the code to increment the state doesn’t seem to check that the new replication key value is “after” the previous replication key value, is it?
e
edward_smith
04/27/2021, 2:12 PM
Is this the check you are looking for on line 200? ``if is_sorted and old_rk_value and old_rk_value > new_rk_value:``
p
pierre_de_poulpiquet
04/27/2021, 2:38 PM
I’m reading the code in the context of an un-sorted stream.
So is_sorted=false
pierre_de_poulpiquet
04/27/2021, 2:56 PM
An un-sorted stream will emit un-sorted replicationKeyValue (t0 -> t2 -> t1), and the condition that you point will return prematurely as “is_sorted=false”.
So this condition won’t catch “new” replicationKeyValue that are in the past for stream that are not sorted.
a
aaronsteers
04/27/2021, 3:18 PM
@pierre_de_poulpiquet - Yes, I think you are correct. We have the comparison for >= the signpost but not also for the previous bookmark value. It looks like this was due to a regression during refactoring.
Will aim to publish a patch release (with added unit tests) within the next 24 hours, since this would also affect other developers who are intending to support unsorted streams.