Lightly playing around with writing custom taps. S...
# getting-started
j
Lightly playing around with writing custom taps. Some REST endpoints don’t offer a nice way to maintain state of what’s been ‘seen’ as you can’t filter by ‘updated_at’. The API does offer a filter by day like api.example.com/sales?started_at=2023-08-01… But a tap would require extra knowledge to know how far back to look on what records are still ‘open’ to being updated. The records can’t be modified after say 30d so I’m thinking a tap that recrawls a fair bit of history but essentially does currentday-30. However I’m not sure how to do this since the result can be paginated itself. So I feel like I need two cursors one to walk the api calls up to current_date and one to follow pagination if needed. Any suggestions?
v
Looks pretty easy to me if you have
started_at
, if it's 30d then changed started_ad to today-30d, done
j
I think I’m missing what aspect of my tap should be modified to update the parameter on the next ‘round’ of calls
v
next round means next pipeline run?
Again it'd be today-30d
j
Sorry don’t know the right terminology… Inside a single run start_date=current_date-30 Tap starts a fetch at t-30 • follows pagination • follows pagination Tap needs to run at t-29 <- where should this logic live?
v
Is api.example.com/sales?started_at=2023-08-01 not a conditional for started > 2023-08-01? It's only the exact day? That must be what I"m missing
You can write a custom paginatior, then plop your logic in there!
then you'd have both "cursors" I think that's your question 😄
j
The API would only return records on the specific date. So If I’m calling 30d of history I’m essentially making 30 unique “start” api requests and then following pagination until it’s done
v
Or you could think about it as youre paginator includes days
j
I think that works. If the response block doesn’t have the pagination link increment day yeah that seems pretty good, I guess if the
v
The other option is you could think about them as partitions in the sdk, and create a partition for every day
I think combining them into one paginator is probably easier but 🤷
beauty of the sdk is the other option is just write the logic in
get_records
and ignore all the pagination / patition stuff
j
Thanks for your suggestions and help Derek. I have the tap up and running to a postgres instance but my Paginator is kinda gross. Because I need the query parameter initiated at to exist on the first query I don’t believe I can only use the Paginator. I had to adjust the get_params function of my stream to populate the default params block with the starting
started_at
date. Then my paginator parses the qsl for the initiated at date, walks it forwards, and reapplies it.
v
Glad it works! If you share your code someone here may be able to provide pointers ideally the GitHub repo!