How does Singer or Meltano handle the case where I...
# best-practices
l
How does Singer or Meltano handle the case where I need to re-sync entire table? By re-sync I mean to delete table(or all records) from target and sync all records from source, I don't see there is a Singer message type which indicates delete table or records. Do I have to deal with this logic myself?
t
As far as I know, yes. I've had to do this a number of times myself; my process is: 1. Drop the table in the destination 2. Run
meltano state get some_id > state.json
to get the state of the pipeline 3. Edit state.json to remove the entry for the table in question 4. Run
meltano state set sone_id --input-file state.json
to load the modified state data 5. Run the pipeline. That's my general process anyway. The commands may not be perfect, I just typed them out from memory. 😉
l
thanks
I think it might be very useful if Singer Spec supports DELETE TABLE(Or Truncate Table) message. In some cases, only extractor knows when re-sync should be triggered. @aaronsteers
a
I think it might be very useful if Singer Spec supports DELETE TABLE(Or Truncate Table) message. In some cases, only extractor knows when re-sync should be triggered.
Singer does have something close to this but it is not yet 'officially' part of the spec or implemented broadly. The
ACTIVATE_VERSION
message can be sent by a tap after a full table sync. This is generally implemented as the epoch time of the records that are in this 'latest' sync. And then the target can perform either a soft or hard delete of records that do not have the latest table version.
The SDK has partial built-in support as of now, with some follow-ons logged here: Add support for ACTIVATE_VERSION message types · Issue #18 · meltano/sdk (github.com)
Contributions welcome, by the way! This is something we want to make sure is fully implemented by our
1.0
SDK launch.