Hey all, I'm working with a team on integrating a ...
# singer-tap-development
r
Hey all, I'm working with a team on integrating a tap for Klaviyo into our data warehouse. For those unfamiliar, Klaviyo handles email metrics. Currently, we're looking to pull in the emails of list members and then update those tables by both appending and removing list members. The problem we're running into is in figuring out how best to update tables if a target doesn't support record removal or know the replication method of a tap. Would Target-Postgres be limited to having to make a brand new table and load it each time a list member is removed? Has anyone encountered a similar situation? Also, let me know how else I can clarify the problem if need be.
a
I think this may require using "ACTIVATE"
ACTIVATE_VERSION
message feature - which is a Singer extension. By the tap sending an ACTIVATE_VERSION message along with a row version number. they signal the deactivation of records prior to a specific version number in the target. Can you clarify one point in your requirement? Are you able to "see" which records are deleted in the upstream job? If yes, then a soft-delete flag would probably be preferable. If no, then even with the ACTIVATE_VERSION message feature, you may be forced into the equivalent of doing a full table extract each time, since that's the only way to actually identify which were present previously and are no longer present now.
d
(the message is called
ACTIVATE_VERSION
: https://gitlab.com/meltano/meltano/-/issues/2508)
r
Are you able to "see" which records are deleted in the upstream job?
I'm almost positive the answer to this is no, but let me see if I can find any additional information anywhere Thank you guys for such a quick response! I'll update the thread once I have more clarification on my side
It seems to be that there is no memory of which records get deleted. The replication method used by the stream is full table, so instead it's keeping track of a whole new table with the current list members
p
@ryan_bell i don't know exactly how target-postgres works, however i use a target-snowflake which is based on it and in my case it generates a column with the time of the update. The way i usually handle this is to do in DBT a table or view such as:
Copy code
select *
from source_table
where _sdc_batched_at = (select max(_sdc_batched_at) from source_table)
Would that work for you? You don't even need dbt, you can just create such a view directly on SQL
I think the sdc columns are not target-snowflake specific, but actually inherited from Postgres, but i never used target-postgres myself
a
_sdc_
columns are documented here in the SDK docs: SDK Implementation Details - Record Metadata — Meltano SDK 0.3.3 documentation Many taps and targets will support these columns, and they are helpful when provided. (Some targets actually have a setting to toggle whether you wnt them written or not.)
And also, to echo what @pablo_seibelt said, if you are running a full extract each day, you can generally grab just the non-deleted versions by leveraging the date in an sdc metadata column.
h
@ryan_bell Off topic, but are you guys just using the Klaviyo metrics API? Or are you grabbing list data? We were considering building a Klaviyo tap ourselves
r
We're only pulling in lists and list members, so I believe that we don't use the Klaviyo metrics API
h
@ryan_bell Oh cool, is it open source by any chance?
r
The API or our tap?
h
@ryan_bell The tap
r
Right now we don't have a functioning tap released. Not all the features we were looking to include were properly implemented so for the time being we still have it unreleased
h
Makes sense, would be amazing to share it with the community once it’s ready. The current Klaviyo tap is pretty limited