Hey team! I was wondering what technique you guys...
# singer-tap-development
s
Hey team! I was wondering what technique you guys used to delete entries in your warehouse after they have been deleted in your source? Currently, I have more entries in my warehouse than I do in my source because elements deleted in my source have not been propagated to the warehouse (following an insert-only mentality). Is there a quick meltano command to trigger a flush of the warehouse?
a
There's no flush, per se, at least not as of yet. What you can do is run a FULL_TABLE sync on top of your existing INCREMENTAL-based loads. Then, you can logically remove anything (by physically deletion or by a soft-delete flag) who's last updated date is less than the datetime of your full sync.
There are other methods as well, but this is generally the easiest. The "best" method is to try to get soft-delete flags upstream in the source system but of course, this isn't always an option.
Another path is to see if your source system can support LOG_BASED replication, since this can provide delete events (similar to the soft-delete column addition upstream).
Is this helpful at all?
s
Yes thanks! Sadly not the magic answer I was hoping for, but still it's great 😉
a
Hi @Stéphane Burwash - our @daniel_walker reviewed a number of approaches and put it up on medium. I'll ask if he can post the link here. Hope it helps.
d
Hey, yeah we ended up with a data deleted from source problem a while back and I ending up trying out a few ideas. Wrote up what I found: https://medium.com/@danielpdwalker/handling-hard-deleted-data-from-source-5578e67f5a0c Hopefully helpful 🙂
s
Thank you so much, this is super awesome!