Matt Menzenski
05/05/2023, 5:44 PMmeltano run tap-mongodb target-postgres
job that got ~85% of the way through an incremental load yesterday and then errored out. When I ran that job again today, I was surprised to see it run from the beginning, rather than starting from that 85% mark.
Is the state not saved in a resumable way if the job errors out? Or is there potentially something I need to be doing differently in the tap implementation?Matt Menzenski
05/05/2023, 5:45 PMMatt Menzenski
05/05/2023, 6:59 PM{
"singer_state": {
"bookmarks": {
"backfill_talk_box_conversation": {
"starting_replication_value": "1970-01-01",
"progress_markers": {
"Note": "Progress is not resumable if interrupted.",
"replication_key": "_id",
"replication_key_value": "5e15ef2c18fdcf0001eaf01e"
}
}
}
}
}
visch
05/05/2023, 7:05 PMmeltano.yml
it'd be helpful that way we can see which taps/targets you're using etcMatt Menzenski
05/05/2023, 7:08 PMplugins:
extractors:
- name: tap-mongodb
variant: menzenski
pip_url: git+<https://github.com/menzenski/tap-mongodb.git@1712b4f0d59db434413cc6e4a01a4f199ef0164f>
config:
add_record_metadata: true
allow_modify_change_streams: true
- name: tap-talk-box-backfill
inherit_from: tap-mongodb
config:
database: talk-box
prefix: backfill_talk_box
select:
- backfill_talk_box_conversation.*
- backfill_talk_box_conversationdefinition.*
metadata:
'*':
replication-key: _id
replication-method: INCREMENTAL
loaders:
- name: target-postgres
variant: meltanolabs
pip_url: git+<https://github.com/MeltanoLabs/target-postgres.git@85d932ab14b94f9595a84ade39f9a8e7fa0c5213>
config:
add_record_metadata: true
database: paw_crucible
- name: target-postgres-staging-payit
inherit_from: target-postgres
config:
default_target_schema: staging_payit
Specific command run was meltano run tap-talk-box-backfill target-postgres-staging-payit
Matt Menzenski
05/05/2023, 7:09 PM"replication_key_value": "5e15ef2c18fdcf0001eaf01e"
value (which is the behavior that I want)Matt Menzenski
05/05/2023, 7:10 PMmeltano run
and I should be using meltano elt
, or whether the issue is that these BSON ObjectId strings aren’t alphanumerically sortable, or if there’s something elsepat_nadolny
05/05/2023, 7:14 PMis_sorted
attribute thats defaulted to False. See https://sdk.meltano.com/en/latest/incremental_replication.html#example-code-timestamp-based-incremental-replication. If the tap thinks the output is unsorted it wont be resumable because it cant be sure all records have been retrieved up until the replication key value until the entire stream completes successfullyMatt Menzenski
05/05/2023, 7:14 PMMatt Menzenski
05/05/2023, 7:35 PM• The SDK will throw an error if records come out of order whenis true.is_sorted
Matt Menzenski
05/05/2023, 7:36 PMMatt Menzenski
05/05/2023, 7:36 PMpat_nadolny
05/05/2023, 7:48 PMcheck_sorted
method too to disable those checks. For me I had to get subbatches that are sorted but I use >=
so sometimes theres an overlap on the edge of a batch where a duplicate record with an earlier timestamp gets sent again and the SDK was throwing an error because it thought it was unsorted but its notpat_nadolny
05/05/2023, 7:48 PMMatt Menzenski
05/05/2023, 7:49 PMMatt Menzenski
05/06/2023, 4:05 AMis_sorted
to True (when running in incremental replication mode) and setting the replication key value to an ISO-8601 string representation of the ObjectId’s timestamp component (rather than the ObjectId’s hex string representation that I had been using previously) fixed this issue.
Now, when a tap errors, the state is saved in a way that can be resumed on the next run.