Hi, I am looking to use the <BaseOffsetPaginator> ...
# singer-tap-development
s
Hi, I am looking to use the BaseOffsetPaginator but I would like to pass in a parameters like the jsonpath to where the token in located in the response. I'm trying to write a generic function so I can use if for many types of API's, and the location of where the tokens that I need to use is in different locations. For example on the NOAA website the location here under metadata -> resultset. Ideally if I could pass in the location using NEXT_PAGE_TOKEN_PATH='$.metadata.resultset', then I could directly access the tokens limit, count, and offset.
Copy code
{
	"results": [
		{
			"id": "GSOY",
			"name": "Global Summary of the Year",
			"datacoverage": 1,
			"mindate": "1763-01-01",
			"maxdate": "2015-01-01"
		},
		...
	],
	"metadata": {
		"resultset": {
			"limit": 25,
			"count": 11,
			"offset": 1
		}
	}
}
This is my code at the moment, but would really like to pass in the JSONPath so I could use that to obtain the token. Any suggestions would be much appreciated.
Copy code
class RestAPIOffsetPaginator(BaseOffsetPaginator):
    def has_more(self, response: requests.Response) -> bool:
        """Return True if there are more pages to fetch.

        Args:
            response: The most recent response object.

        Returns:
            Whether there are more pages to fetch.
        """

        pagination = {}
        pagination = response.json().get('metadata', {}).get('resultset')
        if pagination and all(x in pagination for x in ["offset", "limit"]):
            if pagination["offset"]:
                return True
        return False
And I call it like so.
Copy code
def get_new_paginator(self):
        """Return the requested paginator required to retrieve all data from the API.

        Returns:
              Paginator Class.

        """
        
        return RestAPIOffsetPaginator(start_value=0, page_size=25)
I would ideally like to pass in the JSONPath so I can use it in my custom paginator like so. Please note another API may have the tokens in another location like pagination so I would like it to be a passed parameter. Any assistance would be much appreciated. Thanks
Copy code
return RestAPIOffsetPaginator(start_value=0, page_size=25, jsonpath=self.next_page_token_jsonpath)
r
Something like this?
Copy code
from requests import Response
from singer_sdk.helpers.jsonpath import extract_jsonpath
from singer_sdk.pagination import BaseOffsetPaginator


class RestAPIOffsetPaginator(BaseOffsetPaginator):
    def __init__(self, *args, jsonpath: str = None, **kwargs):
        super().__init__(*args, **kwargs)
        self._jsonpath = jsonpath

    def has_more(self, response: Response):
        pagination = next(extract_jsonpath(self._jsonpath, response.json()), None)

        if not pagination:
            return False

        limit: int = pagination["limit"]
        count: int = pagination["count"]

        return limit == count
In your example, I don't understand how you are determining if there are more records to fetch or not based on
offset
and
limit
(as far as I can tell, you're just checking that they both exist in
resultset
). My example compares
limit
and
count
instead (i.e. if page is full, there must be more), but I'm just assuming how the API works so could be wrong.
s
Brilliant, that looks like it will work really well. Thanks for sending that through - very helpful. I get your point re the limit == count. It seemed to be working for the API but I prefer what you have proposed. Thanks again.