How do people handle duplications in Big Query? I...
# troubleshooting
j
How do people handle duplications in Big Query? I just ran my elt again and realized it created duplicate entries for updated rows (I set incrementalUpdate key to the
last_updated
field).
Is it possible to configure it to respect the primary key but importing rows again (overwriting) when they have been updated?
e
j
That sounds very promising @edgar_ramirez_mondragon. Is there any particular configuration needed to enable that?
e
yup, that variant is not "discoverable" yet so you have to add it to
meltano.yml
manually. A good start could be
Copy code
plugins:
  loaders:
  - name: target-bigquery
    namespace: bigquery
    pip_url: pipelinewise-target-bigquery
    settings:
    - name: project_id
      description: BigQuery project
    - name: dataset_id
      description: BigQuery dataset
    - name: default_target_schema
      value: $MELTANO_EXTRACT__LOAD_SCHEMA
      description: BigQuery default dataset for all streams
    - name: location
      value: US
      description: Dataset location
    - name: add_metadata_columns
      kind: boolean
      value: true
      description: Whether to add EL metadata columns
    - name: credentials
      description: Path to Google API credentials file
      env: GOOGLE_APPLICATION_CREDENTIALS
    dialect: bigquery
    target_schema: $MELTANO_EXTRACT__LOAD_SCHEMA
    config:
      credentials: ${MELTANO_PROJECT_ROOT}/.secrets/client_secrets.json
j
thx I'll experiment a bit. it will do upserts without any special settings?
e
yeah if your streams have a primary key, it'll use that to do upserts
j
cool. running a test import now
thx for the help!