is there a way to have SDK-based taps be resumable...
# singer-tap-development
p
is there a way to have SDK-based taps be resumable for streams that are reverse sorted? i’m trying to update this stripe tap to be resumable and the Stripe API’s
list
endpoints return data reverse-sorted by my replication key with no option to customize that behavior as far as i can tell
a
Do you know if the API supports getting data in batches, perhaps a month at a time so each month (although sorted descending in terms of individual records) could still form an ascending bookmarkable unit?
p
the replication key is a timestamp and i can pass a start and end value. how would i do that? would i be writing logic to chunk up the date range into batches and updating state myself or is there something in the SDK to support something like this?
a
I don't know of any samples doing this today, but in theory something like this could work: 1. Use a custom dict as your next_page_token and use that token to track and inner and outer loop. The outer loop is weeks, days, or months and the inner lip is the built-in pagination token. 2. At each step in the outer loop (for instance, after completing each week of the outer loop), it should be safe to save a bookmark at that point. 3. At least in theory, the finalize_state() method can be called after each step in the outer loop, marking that position as resumable to the downstream.
I don't remember the exact name of the finalize method, but this should get you pretty close. Do you think that might work for your use case?
p
That sounds like it should work! I'll let you know when I give it a shot. Thanks AJ
a
Great! And in the process, if you have a tiny bit of time to contribute back to our docs, would be really helpful to add an example of "inner+outer loop pagination" to the code samples page on sdk.meltano.com. Similar topics comes around periodically regarding advanced pagination, and it would be super helpful to have some demo which new devs could use as a reference.
p
absolutely! probably going to try the approach you suggested next week but i’d be happy to contribute an example
as far as i can tell that approach doesn’t seem to have worked. i took a stab at paginating one day at a time here and calling
self.finalize_state_progress_markers()
after each iteration of the outer loop. however, a stream was interrupted after getting through a solid year’s worth of data and when the next run started the state emitted showed it starting back at the start date. any idea what i might be doing wrong? i know the pagination through dates itself is working because i can see logs from the Stripe client library i’m using that show the replication key filters changing as expected every so often.
okay coming back to this, i wanted to see if anyone has thoughts - i have an SDK based tap for Stripe, which returns reverse sorted data. i'm making one day chunks, iterating through them, and calling
finalize_state_progress_markers()
at the end of each iteration. i logged
self.stream_state
at the end of an iteration to confirm it was working and i do see the state i pushed
Copy code
{'starting_replication_value': '2022-01-01T00:00:00Z', 'progress_markers': {'Note': 'Progress is not resumable if interrupted.', 'replication_key': 'created', 'replication_key_value': 1641237037}}
however, the next run starts right back at the beginning if that stream doesn't run to completion and starts with a
[warning  ] No state was found, complete import.
the repo is here if that helps. without this working my first run would take a couple of days 😬
d
Hey Prratek, you are using target-bigquery, right?
p
yes - are you thinking it might be related to this? https://meltano.slack.com/archives/C01V8L0NZC1/p1645481978427519
d
Yea, I worked with Ruslan on
merge_state_messages
since the default target behavior did some merging behavior with state. Wasn't sure if that was impacting what you are seeing
p
yeah let me set
merge_state_messages
and see if that helps
d
yea, set it to False
a
@prratek_ramchandani - re:
however, the next run starts right back at the beginning if that stream doesn't run to completion
For referse-sorted streams, they wouldn't be resumable until after a given stream completes its iteration. What you could do to mitigate would be to wrap in a timeblock that can still be ascending and then finalize the state market after the stream is caught up at least to that checkpoint.
Just logged 👆