hawkar_mahmod
03/22/2023, 12:54 PMvisch
03/22/2023, 1:33 PM_sdc_replication_date
would work.
https://github.com/AutoIDM/tap-indeed/blob/main/tap_indeedsponsoredjobs/streams.py#L128
Is different than your ask but may provide some context
Probably other options but that's the first thing I thought about!visch
03/22/2023, 1:35 PMupdated_at
fieldhawkar_mahmod
03/22/2023, 4:51 PMDate-MediaSource-Campaign
combination (this is what is set as primary key in the stream) such that we overwrite that data (where necessary) for seven days. So if a record has a Date value 21/3/22, we would "refresh", and attempt to overwrite that value in our target until 29/3/22 (after which we wouldn't care about the 21/3/22).
From what you've said and the example I'm a bit clearer but still not sure on how to proceed. I understand now that a replication key is required but it isn't clicking what that should be. Are you suggesting it should be sdc_replication_date
, which is just the date the replication attempt was made?hawkar_mahmod
03/22/2023, 5:06 PMvisch
03/22/2023, 5:08 PMalexander_butler
03/22/2023, 6:23 PM--state path/to/state.json
, which is how any singer tap gets its state. Meltano does it for you so you don't see it. The state.json
itself is derived from the `target`'s stdout stream. A target receives and buffers STATE
messages from the tap, whenever the target decides to "commit" data to the data store, it typically "propagates" the buffered state to stdout. Meltano scrapes the stdout for the last propagated state message. This is target specific, some targets will propagate state immediately, some buffer it.
What it sounds to me like you want is very simply a configurable lookback window. Thats pretty common. I would add it to your taps config
options so the user can adjust for their specific use case. The actual implementation will just use the builtin SDK method for get starting replication key value and subtract a timedelta based on your configured lookback. And I think thats about all you need @hawkar_mahmodhawkar_mahmod
03/24/2023, 12:39 PM--state
argument to Singer it seems as though this is by design and the only way to override the state value is to do so in the implementation of the stream, either by overriding get_starting_timestamp
or resolving to the desired value elsewhere.
The docstring of get_starting_timestamp
states:
Developers should use this method to seed incremental processing for date
and datetime replication keys. For non-datetime replication keys, use
meth`~singer_sdk.Stream.get_starting_replication_key_value()`Which isn't possible given that Meltano always passes --state in. Have I got that right?
alexander_butler
03/24/2023, 5:28 PMget_starting_timestamp
like you normally would, and afterwards, apply the timedelta subtraction. Do that anywhere your tap uses the state. There shouldn't be any need to do any overrides.alexander_butler
03/24/2023, 5:31 PMalexander_butler
03/24/2023, 5:32 PMalexander_butler
03/24/2023, 5:33 PMvisch
03/24/2023, 6:03 PMget_starting_timestamp
and if the date is today (2023-03-24) subtract your window (sepearte config maybe call it window
) from that (7 days as a default?) and then query your system.hawkar_mahmod
03/28/2023, 4:25 PMalexander_butler
03/28/2023, 4:27 PMvisch
03/28/2023, 4:29 PMcatalog.json
is where you control how a stream syncs. https://hub.meltano.com/singer/spec#catalog-files
Taps take in
config, catalog, and state.