Can I use state just to limit the amount of data g...
# singer-tap-development
a
Can I use state just to limit the amount of data getting written to the target? i.e. the API doesn't support a
since
timestamp parameter, it just returns all appropriate records, but I can override
get_records
and only yield row where
timestamp
is after the last time I ran. How can I get the relevant state in
get_records
if this is a child stream?
Copy code
def get_records(self, context: dict | None) -> Iterable[dict[str, Any]]:
        for record in self.request_records(context):
            transformed_record = self.post_process(record, context)
            if self.stream_state:  # do something here?
                yield record
d
Take a look at incremental replication https://sdk.meltano.com/en/latest/incremental_replication.html In short,
replication_key
and
get_starting_timestamp
should help you to skip unnecessary records
a
That was simple, thankyou!
Copy code
replication_key = 'timestamp'
    is_sorted = False

    def get_records(self, context: dict | None) -> Iterable[dict[str, Any]]:
        for record in self.request_records(context):
            transformed_record = self.post_process(record, context)
            if isoparse(record['timestamp']) >= self.get_starting_timestamp(context):
                yield record
I was curious as to whether the state would save the maximum timestamp seen in the data, or the timestamp of the run. It looks like the maximum timestamp of the data.
And I should be using
>=
in the comparison, not
>
to keep with the expected semantics?
d
In general the state’s key value is the maximum successfully processed value. The moment it acquired depends on
is_sorted
/
check_sorted
values. https://github.com/meltano/sdk/blob/2cb74b54f9694b4acc9df5d0e5892616301d2f39/singer_sdk/streams/core.py#L742 https://sdk.meltano.com/en/latest/implementation/state.html#the-impact-of-sorting-on-incremental-sync
The comparison depends on the source data. In most cases
>=
would be safer in case if new records with the same timestamp appear in source later.