stephen_bailey
12/14/2021, 4:26 PMupdatedAt
field and a createdAt
field, but the updatedAt
field is null after the initial creation (i.e. never updated)? Is it acceptable, for example, to use post_process
to create a created_or_updated_at
field that combines the two?stephen_bailey
12/14/2021, 4:27 PMupdatedAt
results in replication key errors when the value is null:
File "/.../site-packages/singer_sdk/helpers/_state.py", line 220, in increment_state
if old_rk_value is None or new_rk_value >= old_rk_value:
TypeError: '>=' not supported between instances of 'NoneType' and 'int'
edgar_ramirez_mondragon
12/14/2021, 5:17 PMIs it acceptable, for example, to useI think it is perfectly acceptable 😄. I wonder if the SDK should expose an API for overriding the way the replication key value is extracted/composed from the record.to create apost_process
field that combines the two?created_or_updated_at
stephen_bailey
12/14/2021, 6:08 PMcreated_after
records and then do a separate pull of updated_after
records, because if i just filter by updated-after
, it won't pull records that were nearly created but never updated! 😭aaronsteers
12/14/2021, 7:29 PMI think it is perfectly acceptable...Ditto. Yeah, that's what I'd do too. 👍
post_process()
to add them custom field sounds perfect, and also (friendly reminder) you'll want to declare that property also in the stream's schema.
Re @stephen_bailey :
...unfortunately there are now api side issues...😢 - sorry dude. Some APIs are just badly designed... If there's no way to "just get all the records created or modified since this timestamp", then you'll have to make two rounds of calls as you described. There's a way to do this with complex pagination tokens - basically defining the
next_page_token
as a custom dict and just making sure you loop through both extraction patterns before returning None
from get_next_page_token()
stephen_bailey
12/14/2021, 7:54 PMaaronsteers
12/14/2021, 8:29 PMdan_mosora
12/14/2021, 9:12 PMcreated_at
into the updated_at
to maintain consistency of replication key, rather than inventing a new field. Both approaches are probably fine, depends on what kind of flexibility you want.dan_mosora
02/04/2022, 3:20 PMvalid-replication-keys
and the sync mode goes through the data set and uses the max of the two.
This is kind of an interesting concept because there’s the question of whether to read multiple replication keys as OR
or AND
. Maybe it’s a bit less of a concern than something like key properties since a replication key is purely informational for either an orchestration/UI/platform actor or the tap’s own sync mode, but thought it an interesting idea to bring up.