`get_starting_timestamp()` requires a context obje...
# troubleshooting
a
get_starting_timestamp()
requires a context object as a parameter. Where do I find that context object at runtime?
e
It should be available in the scope of the caller, e.g.
get_url_params(self, context, next_page_token)
. Where are you trying to call it?
a
Based on the API implementation, we must in our requests sort descending by the replication key, and then stop paginating when we reach our bookmark timestamp. That's why I need access to the context, to get the latest timestamp
Copy code
def get_new_paginator(self) -> BaseAPIPaginator:
        """overrides base class method to return a paginator"""
        bookmarked_updated_at = self.get_starting_timestamp(None)
        return KrowPaginator(bookmarked_timestamp=bookmarked_updated_at)
In that code,
get_starting_timestamp(None)
returns the bookmark value for a parent stream when using this state
Copy code
{
  "bookmarks": {
    "organizations": {
      "replication_key": "updated_at",
      "replication_key_value": "2023-03-22T02:03:52.751Z"
    }
  }
}
But
None
for child streams when using this state
Copy code
{
  "bookmarks": {
    "campaigns": {
      "partitions": [
        {
          "context": {
            "organization_id": "0715aee1-54f9-4707-bb67-5950dc695ce7"
          },
          "replication_key": "updated_at",
          "replication_key_value": "2023-03-21T23:13:38.720Z"
        }
      ],
      "replication_key_value": "2023-03-21T23:13:38.720Z"
    },
    "organizations": {}
  }
}
Yeah. So I think this boils down to I need the child's replication key value during the call to
get_new_paginator
but the context at that moment has ALL the partitions in it, with no way to distinguish which is the current partition's value. And a call to
get_starting_timestamp(None)
returns
None
instead of the current child partition's replication key value
From here https://sdk.meltano.com/en/latest/implementation/state.html#partitioned-state, this indicates the behavior I want, but not that I am seeing "For parent-child streams, the SDK will automatically use the parent’s context as the default state partition."
And https://sdk.meltano.com/en/latest/context_object.html#the-context-object indicates that the context (and associated functions like
get_starting_timestamp
) should be pre-filtered to the current partition "Many of the methods in the Stream class and its subclasses accept a
context
parameter, which is a dictionary that contains information about the stream partition or parent stream."
d
I solved it by caching
starting_timestamp
value as stream’s attribute:
Copy code
def get_records(self, context: Optional[dict]) -> Iterable[dict[str, Any]]:
  self._starting_timestamp = self.get_starting_timestamp(context)
  yield from super().get_records(context=context)
And passing it’s value to custom paginator:
Copy code
def get_new_paginator(self) -> CustomPaginator:
  return CustomPaginator(
    start_value=0,
    page_size=self._items_per_request_limit,
    starting_date=self._starting_timestamp
  )
The cleaner way IMO would be to add context to
get_new_paginator
call from
RESTStream.request_records
since it’s already has context: https://github.com/meltano/sdk/blob/47290d0470a38a9e978b53a7436212c43b605f2d/singer_sdk/streams/rest.py#L354
Copy code
paginator = self.get_new_paginator(context)
a
@Denis I. thank you for taking the time to share this with me
This solution worked. @edgar_ramirez_mondragon I agree with @Denis I.’s proposed solution to make context accessible to the paginator. That seems cleaner
e
Yeah, I can’t see any harm in being able to use the context object when initializing the pagination class. I’ve created an issue for it: https://github.com/meltano/sdk/issues/1520