How do people handle duplications in Big Query I just ran my Meltano #troubleshooting

How do people handle duplications in Big Query? I...

jonas_kalderstam

02/04/2022, 1:46 PM

How do people handle duplications in Big Query? I just ran my elt again and realized it created duplicate entries for updated rows (I set incrementalUpdate key to the

last_updated

field).

jonas_kalderstam

02/04/2022, 1:47 PM

Is it possible to configure it to respect the primary key but importing rows again (overwriting) when they have been updated?

edgar_ramirez_mondragon

02/04/2022, 4:52 PM

I've had more luck doing upserts with https://github.com/transferwise/pipelinewise-target-bigquery/

jonas_kalderstam

02/07/2022, 9:21 AM

That sounds very promising @edgar_ramirez_mondragon. Is there any particular configuration needed to enable that?

edgar_ramirez_mondragon

02/07/2022, 5:17 PM

yup, that variant is not "discoverable" yet so you have to add it to

meltano.yml

manually. A good start could be

Copy code

plugins:
  loaders:
  - name: target-bigquery
    namespace: bigquery
    pip_url: pipelinewise-target-bigquery
    settings:
    - name: project_id
      description: BigQuery project
    - name: dataset_id
      description: BigQuery dataset
    - name: default_target_schema
      value: $MELTANO_EXTRACT__LOAD_SCHEMA
      description: BigQuery default dataset for all streams
    - name: location
      value: US
      description: Dataset location
    - name: add_metadata_columns
      kind: boolean
      value: true
      description: Whether to add EL metadata columns
    - name: credentials
      description: Path to Google API credentials file
      env: GOOGLE_APPLICATION_CREDENTIALS
    dialect: bigquery
    target_schema: $MELTANO_EXTRACT__LOAD_SCHEMA
    config:
      credentials: ${MELTANO_PROJECT_ROOT}/.secrets/client_secrets.json

jonas_kalderstam

02/07/2022, 9:06 PM

thx I'll experiment a bit. it will do upserts without any special settings?

edgar_ramirez_mondragon

02/07/2022, 9:12 PM

yeah if your streams have a primary key, it'll use that to do upserts

jonas_kalderstam

02/07/2022, 9:17 PM

cool. running a test import now

jonas_kalderstam

02/07/2022, 9:17 PM

thx for the help!

Open in Slack

Previous Next