Hey there, Long time user of Meltano here, but I'v...
# plugins-general
r
Hey there, Long time user of Meltano here, but I've been running into some issues the last week or so with
tap-bing-ads
, which is a new plugin for the project I'm working in. I've got it all set up and authorized to the point where I can successfully pull in whatever data is relevant to me, which for our use case is the campaign and ads performance reports. However, I've discovered that whatever is controlling replication in this tap doesn't actually keep records distinct (or not in a way that I completely understand). Basically, if I pull in the last week of data for the two streams I mentioned (
campaign_performance_report
,
ads_performance_report
) all is well and good. But the next time I run the tap, instead of ignoring any of the records already in my Postgres DB, it creates copies of each one. So if there were 7 records before, now there are 14. Obviously this can be fixed with some SQL down the line, but I'd rather not have the DB filled with tons of duplicate records. Has anyone else encountered this issue while working with
tap-bing-ads
? I'm assuming that this is the important/relevant line here (see below), but even changing the list of primary keys to be what I think they should be for distinct records, I still get duplicates
Copy code
singer.write_schema('ads', ads_schema, ['Id'])
1
v
I don't know tap-bing-ads exactly so there may be other workarounds but this sounds like https://sdk.meltano.com/en/latest/implementation/at_least_once.html
The issue is normally the API offers something like updated_at >= (date/timestamp for yesterday at 10:00pm exactly or something) You want to use the updated_at from your last record as that's the last record you've received. There's some mechanisms that could be built to help fight this kind of thing off in the tap like a hash check or something that we'd store in state
Example SQL for how to fix this in that SDK post
r
Thanks for helping out. I did end up finding out what was wrong, which is just that I was looking at the wrong
write_schema()
lines 🤦. I was looking at the ones for
ads
and
campaigns
instead of the performance reports. For anyone else who runs into this, I simply modified the primary keys within the
sync_report_interval()
function
👍 1