Weird question: I got a request with the followin...
# singer-tap-development
s
Weird question: I got a request with the following requirements: • We load data from a specific source into our warehouse (Bigquery) • Once a new record has been added, we want to: ◦ Validate it manually ◦ Parse the data into a usable format ◦ Send this latest record as a slack alert Has anyone ever worked with something like this? Any idea how one could do this?
a
Hi, @Stéphane Burwash. Can you say a bit more about the manual validation step? Do you perhaps want to keep records in a queue until they receive signoff from a human?
s
The format hasn't been established yet, so it's fluid; it may be a feature flag that comes in with the record or a manual input. The goal is to send out a notification ONCE for each record once it has been deemed done. It's the once that is very much hurting my brain
a
Thanks very much for the additional context. The best approach here is probably something we have not built yet in the SDK - although I know there is some 'prior art' in this area. Feature: duplicate-proof replication · Issue #161 · meltano/sdk (github.com)
The above proposal would suggest taps developer and/or tap users could opt in to duplicate-proof replication. When enabled, we'd add something like
record_hashes_seen
array to the
STATE
For normal incremental replication, we'd only need to track the hashes of those which are greater than or equal to the bookmarked value, since those are the ones that may come through again on subsequent syncs. For full-table replication, we'd track all hashes seen, which basically would emulate incremental replication.
Does this sound like a viable approach? Any other options/alternatives I'm not thinking of?
s
This definitely sounds viable; I will need to look into how this could work, and if I could do it directly through DBT, but I willl keep you guys updated on what I end up doing. Regardless of what I end up doing, this should be a very cool project