https://linen.dev logo
#singer-tap-development
Title
# singer-tap-development
j

julian_knight

08/04/2021, 5:56 PM
Following up on an earlier question (thanks for pointing us to
ACTIVATE_VERSION
!) with a clearer understanding, I'm curious about singer best practice in this situation. (cc @ryan_bell) Context: we want to sync Klaviyo list members, which is set of emails belonging to an email list, and users can be added and removed from the list, making it not a typical
INCREMENTAL
-type replication but more like a many-to-many join table. We could do
FULL_TABLE
replication and then use
ACTIVATE_VERSION
to delete the version from older runs. However this would still require us to stream all of the data for all of the lists on every run. Some of these lists have millions of members, and many of them do not update frequently, so we were thinking of using state to keep track of last-updated-at for each list and only replicate lists that have changed. I see a few options here, want to get people's thoughts on best approach or if there's one we haven't considered: 1. Suck it up and replicate all the members of all the lists on every run. This may not be an option as we want these updated at a minimum-daily rate and replication takes several hours 2. Send each email list as it's own stream. Then
ACTIVATE_VERSION
will work and each list can track it's own state. However, this seems like bad practice as the streams are now dynamic 3. Don't send
ACTIVATE_VERSION
, instead use some custom python code after the regular EL pipline to do a more complex implementation of cleaning up old records 4. Maybe there's some kind of partitioning solution in Singer I'm not aware of that could handle this?