Anyone came across this issue I have a target Bigquery table Meltano #troubleshooting

Anyone came across this issue? I have a target Big...

sweta_garai

07/24/2023, 5:02 PM

Anyone came across this issue? I have a target Bigquery table. I have chosen 'overwrite' for incremental replication. I run a periodic pipeline using meltano. My table tend to drop every week. Is there a fix to it? Any info or help is appreciated. Here is my target config: loaders: - name: target-bigquery variant: z3z1ma pip_url: git+https://github.com/z3z1ma/target-bigquery.git config: credentials_path: key.json project: my-project dataset: my_dataset location: US batch_size: 50 overwrite: true cc: @alexander_butler @pat_nadolny @Stéphane Burwash

alexander_butler

07/24/2023, 5:06 PM

overwrite

== drop and replace table

sweta_garai

07/24/2023, 5:06 PM

what is the alternate for it?

alexander_butler

07/24/2023, 5:07 PM

if you remove the overwrite option, it is

append

-only by default

alexander_butler

07/24/2023, 5:07 PM

if your incremental only gets new records, then it will work as is

alexander_butler

07/24/2023, 5:08 PM

if it gets updated versions of the same record, you can dedupe during transform OR use the merge strategy provided by the target (which also required

denormalized: true

), either way

sweta_garai

07/24/2023, 5:12 PM

Removing overwrite writes duplicate data on the target table. Do you know if the primary key is determined in the source or target for json schema?

alexander_butler

07/25/2023, 8:36 PM

Removing overwrite writes duplicate data on the target table

Use the

upsert: true

option, its in the README and is what I meant when I mention a merge strategy

determined in the source

Yeah it comes from the source and is separate from the jsonschema. You can see the schema message spec here

sweta_garai

07/31/2023, 6:03 PM

Are you suggesting using upsert: true and denormalized: true together? @alexander_butler

alexander_butler

07/31/2023, 6:43 PM

Yes 👍.

overwrite

will overwrite the table.

upsert

will upsert/merge into the table. So you want

upsert: true

based on what your asking. And in order to use upsert, you need

denormalized: true

Please read the README. Here is a link directly to the part that mentions upsert.

sweta_garai

07/31/2023, 7:51 PM

According to the readme, the schema should have a primary key. My schema doesnt have a primary key. In that case what would you suggest? Here is a snapshot of my target schema:

alexander_butler

07/31/2023, 7:52 PM

You need

denormalized: true

to unpack to

data

field into separate fields.

sweta_garai

07/31/2023, 8:49 PM

when I use denormalized: true, I come across this error. Not sure if anyone has come across it before. google.protobuf.json_format.ParseError: Failed to parse properties field: Message type "net.proto2.python.public.target_bigquery.AnonymousProto_5c6c8be168c30eec158d0fb28557d0cfe1b309c6" has no field named "about_you" at "AnonymousProto_96af3397b77c4eb5e9beae2a9d5c7bcc1b9bf1d6.properties"
loaders: - name: target-bigquery variant: z3z1ma pip_url: git+https://github.com/z3z1ma/target-bigquery.git config: credentials_path: key.json project: project1 dataset: hubspot_data location: US batch_size: 300 denormalized: true

alexander_butler

07/31/2023, 9:01 PM

The way we enable storage_write with bigquery is tricky. We generate the protobuf dynamically from a jsonschema. But jsonschemas don't follow the same rules as protobuf. I recommend you add this

method: batch_job

to your config too, then it should all work fine. 🙂

alexander_butler

07/31/2023, 9:02 PM

Copy code

loaders:
  - name: target-bigquery
    variant: z3z1ma
    pip_url: git+<https://github.com/z3z1ma/target-bigquery.git>
    config:
      credentials_path: key.json
      project: project1
      dataset: hubspot_data
      location: US
      batch_size: 100000
      denormalized: true
      method: batch_job
      upsert: true

This should be fine and performant ☝️

sweta_garai

07/31/2023, 11:42 PM

This works. Thanks a lot 🙂 @alexander_butler

Open in Slack

Previous Next