Hi all. I’m trying to implement a solution for `ha...
# singer-tap-development
m
Hi all. I’m trying to implement a solution for
hard-deletes
in the source of my tap that I’m developing with the sdk. I’ve seen in the thread that
ACTIVATE_VERSION
might be the way to go (vs. some custom logic checking the
sdc_batched_at
columns for stale records). First off, if someone has a better idea I’m all ears. If
ACTIVATE_VERSION
is the way to go though I’m having some trouble wrapping my head around how to implement it in the SDK. Could someone point me to an example tap that uses
ACTIVATE_VERSION
that might move along my understanding? Any hints on how it actually interacts with a target (I think it uses the operational columns?) OR how to tell if a target is compatible would also be rad. Thanks!
a
Unfortunately the
ACTIVATE_VERSION
message is not yet supported in the SDK for taps. We are accepting merge requests and we have a description of the work to do here: Add support for ACTIVATE_VERSION message types (#18) · Issues · Meltano / Meltano SDK for Singer Taps and Targets · GitLab
We currently have defined as a feature we'd like to include at or before the 1.0 SDK release.
m
Gotcha. Thanks. I’ll take a look. It sounds like something like this would be the simplest supported solution for now right? I might be a bit confused about the
_sdc_batched_at
(and
extracted
) column, but I’ll get multiple timestamps for a “long” sync right? So I would just check for records within an interval (e.g.
_sdc_batched_at >= max(_sdc_batched_at) - interval '12 hour'
)
a
Exactly
m
Thanks!!
d
We ran into a problem like this not long ago, I put what I found into a medium article: https://medium.com/@danielpdwalker/handling-hard-deleted-data-from-source-5578e67f5a0c Since then though when @Reuben (Matatika) was making tap-spotify, we made a wrapper stream that added a synced datetime to the data. Then using dbt, you could say get all unique ids, by their most recently synced and mark them as hard deleted if their sync time doesn't equal the max sync time. https://github.com/Matatika/tap-spotify/blob/master/tap_spotify/schemas/utils/synced_at.py https://github.com/Matatika/tap-spotify/blob/3df1041120bd605cc937f4526a707caa3533abe3/tap_spotify/streams.py#L40 The data streams inherit from the SyncedAt stream.
a
Cc @amanda.folson @pat_nadolny