Hi, What is a proper approach to re-run failed or ...
# troubleshooting
d
Hi, What is a proper approach to re-run failed or interrupted tap with the last successful state? (for `run`/`elt` commands)
v
meltano run tap-name target-name
d
run
will not set intermediate states( only once at the end of the sync) , and
elt
will update states each time writing a batch, as I understand
d
Right, with the interrupted
run
I get additional keys in the state object (
replication_key_signpost
,
starting_replication_value
,
progress_markers
) and previous
replication_key_value
with a note Progress is not resumable if interrupted. Subsequent
run
started from the same
replication_key_value
and finished correctly. That’s good. So that means it’s safe to re-run
run
in pipeline in case of interruption or fail. Is there any scenario when the tap could be forced to re-sync everything from scratch?
p
TLDR: running the pipeline is safe and recommended. You can ask it to resume from scratch using “full-refresh” but it won’t randomly do that on its own.
The tap implementation manages how to communicate its state progress. I think at least one reason for the “Progress is not resumable” warning is if the records aren’t guaranteed to be sorted, meaning even though record 2 was received by the target it doesn’t mean record 1 is guaranteed to be already loaded. In this case the sync waits until the completion of the sync to safely save state, if it was interrupted it won’t save 2 as the state due to this lack of a guarantee.
Targets are implemented to “emit” state only once they confirm a batch is successfully loaded in the destination. Meaning if the streams are “resumable” then a failure after successfully loading a few batches will have progressed the state forward and the next sync will not resysnc those batches because they’re confirmed already in the destination.
d
@pat_nadolny For example I have [info ] INFO Writing table batch with 49 rows for
('orders', 'refunds')
... cmd_type=loader name=target-postgres And I dont know if the state will be updated, I have a lot of
Writing table batch
but state table is still empty I've used command
elt tap-shopify target-porsgres --state-id=XXX
v
Which target postgres @dima_petukhov?
d
datamill co
v
https://github.com/datamill-co/target-postgres/blob/master/target_postgres/stream_tracker.py This target doesn't emit state along the way they have a comment there, saying singer is the reason is wrong. You can try the meltanolabs variant and you should see state properly updated along the way
d
@visch meltano variant is not stable as I understand
p
One distinction is between a target emitting state and meltano writing it into the DB. I’m not positive how this works (cc @cody_hanson would know) but my understanding is that meltano collects these state message during a sync then only writes it to the DB at the exit of the sync, completed or failed. So I wouldn’t expect the DB to be updated mid sync, it doesn’t mean state isn’t being tracked though. @cody_hanson can you clarify how it works?
d
@visch elt has some error, when I uninstalled the loader, and, magically it wrote a state before quitting with an error How could we gracefully stop running program ? Which keyboard shortcut is better to use in this case
v
@dima_petukhov if you use the meltano labs target you'd be helping us get closer to stable. I run it in production for 4 separate projects that run all the time myself
m
v
Which
target
are you using @Matt Menzenski ? This thread was using pipeline wise
m
I’m on MeltanoLabs target-postgres
v
Not the same thing then 😕