cc: <@U06CC5F08QK> We're having a problem where w...
# plugins-general
c
cc: @TomasB We're having a problem where we are getting duplicate records (looks like probably the last record from the previous run). We're using postgres transferwise tap and target-bigquery adswerve target. We have replication mode set to INCREMENTAL. On the adswere Readme, they mention:
Copy code
Upserting new rows into the table, using the primary key given by the tap connector (if it finds an old row with same key, updates it. Otherwise it inserts the new row)
We have set the:
Copy code
table-key-properties:
              - id
extended
Copy code
- name: prod
  config:
    plugins:
      extractors:
      - name: tap-postgres--inventory
        config:
          default_replication_method: INCREMENTAL
          metadata:
            public-*:
              replication-key: updated_date
              table-key-properties:
              - id
I thought this would solve the problem. Is there another field for the primary key I should be using instead of table-key-properties?
v
https://github.com/MeltanoLabs/Singer-Working-Group/issues/13 is a great write up of why this happens 😄
I think that's what you're hitting, but if it's not could you share the two records that are duplicates? Normally the target you use will upsert based on primary key
c
yep, you're right, that's the singer spec that takes a "at least once" approach, but yeah, the target is supposed to use the id for the upsert (or merge in GBQ case), which was why I was wondering if there is some different metadata tag or some other way to specify the primary key.
as far as sample data, here's one of the smaller tables:
source table looks like:
clearly see it's always the last row getting replicated as you'd expect from a >= where cause on the updated_date...just not sure why the target isn't performing the merge according to the docs.