I'm wondering what the canonical way of incrementa...
# singer-tap-development
j
I'm wondering what the canonical way of incrementally dealing with data sorted in descending order is. 🧵
I'm working with a stream that returns a list of records, which is being used as parent stream for a child that requests a record at a time. The delta in run time between incremental and full sync is ~6 hours so I'm trying to make this incremental if I can.
The parent stream returns records in descending order of created date (the records are immutable so we don't need to worry about updates). There's no ability to filter the records.
My current approach is: • On the first "page" save the first (most recent) timestamp to a tap attribute • At the end of the run write that timestamp to the state dictionary (manually) • On incremental runs, check whether the last (oldest) timestamp in each page is before the timestamp from the state dictionary and if so, don't fetch any more records.
This feels a bit hacky (particularly writing the state manually), so I'm wondering if there's an simpler approach that I'm missing.
a
Have you seen the docs entry regarding "signposts" and unsorted streams? https://sdk.meltano.com/en/latest/implementation/state.html#dealing-with-unsorted-streams
This might not directly pertain to your use case but perhaps there could be helpful info in there if you haven't read it already.
j
Right now I'm not telling the SDK that it's syncing incrementally (which might be a bad choice) so this doesn't apply as yet. The signpost logic is interesting but I think it's based off the assumption that you can filter records, which I can't.
a
Yeah. The other piece which comes to mind, which might be helpful, is leveraging get_starting_timestamp() for record exclusion logic.
j
Hmm, that's an interesting idea, I might chew that over some.
a
I've run into similar scenarios in the past, if I understand your case correctly. We treated as incremental, but since the API couldn't be filtered, we essentially just drop the records on the ground if they don't meet the start value conditions.
We should add this use case to the docs I just linked to. It's definitely a case that comes up periodically and would be good to have a documented best practice approach.
p
we discussed a similar question here btw. i couldn't get it to work as expected and sorta gave up for the time being https://meltano.slack.com/archives/C01PKLU5D1R/p1639760305331600
j
Thanks Prratek, I had read through that thread yesterday. In the end, the "hacky" option I describe above seems to be working well based off my testing and 1.5 days worth of runs.