Can I use state just to limit the amount of data getting wri Meltano #singer-tap-development

Can I use state just to limit the amount of data g...

Andy Carter

04/17/2023, 8:55 AM

Can I use state just to limit the amount of data getting written to the target? i.e. the API doesn't support a

since

timestamp parameter, it just returns all appropriate records, but I can override

get_records

and only yield row where

timestamp

is after the last time I ran. How can I get the relevant state in

get_records

if this is a child stream?

Copy code

def get_records(self, context: dict | None) -> Iterable[dict[str, Any]]:
        for record in self.request_records(context):
            transformed_record = self.post_process(record, context)
            if self.stream_state:  # do something here?
                yield record

Denis I.

04/17/2023, 10:49 AM

Take a look at incremental replication https://sdk.meltano.com/en/latest/incremental_replication.html In short,

replication_key

and

get_starting_timestamp

should help you to skip unnecessary records

Andy Carter

04/17/2023, 11:35 AM

That was simple, thankyou!

Copy code

replication_key = 'timestamp'
    is_sorted = False

    def get_records(self, context: dict | None) -> Iterable[dict[str, Any]]:
        for record in self.request_records(context):
            transformed_record = self.post_process(record, context)
            if isoparse(record['timestamp']) >= self.get_starting_timestamp(context):
                yield record

I was curious as to whether the state would save the maximum timestamp seen in the data, or the timestamp of the run. It looks like the maximum timestamp of the data.

Andy Carter

04/17/2023, 11:35 AM

And I should be using

>=

in the comparison, not

to keep with the expected semantics?

Denis I.

04/17/2023, 11:49 AM

In general the state’s key value is the maximum successfully processed value. The moment it acquired depends on

is_sorted

check_sorted

values. https://github.com/meltano/sdk/blob/2cb74b54f9694b4acc9df5d0e5892616301d2f39/singer_sdk/streams/core.py#L742 https://sdk.meltano.com/en/latest/implementation/state.html#the-impact-of-sorting-on-incremental-sync

Denis I.

04/17/2023, 11:52 AM

The comparison depends on the source data. In most cases

>=

would be safer in case if new records with the same timestamp appear in source later.

Open in Slack

Previous Next