Anyone came across this issue? I have a target Big...
# troubleshooting
s
Anyone came across this issue? I have a target Bigquery table. I have chosen 'overwrite' for incremental replication. I run a periodic pipeline using meltano. My table tend to drop every week. Is there a fix to it? Any info or help is appreciated. Here is my target config: loaders: - name: target-bigquery variant: z3z1ma pip_url: git+https://github.com/z3z1ma/target-bigquery.git config: credentials_path: key.json project: my-project dataset: my_dataset location: US batch_size: 50 overwrite: true cc: @alexander_butler @pat_nadolny @Stéphane Burwash
a
overwrite
== drop and replace table
s
what is the alternate for it?
a
if you remove the overwrite option, it is
append
-only by default
if your incremental only gets new records, then it will work as is
if it gets updated versions of the same record, you can dedupe during transform OR use the merge strategy provided by the target (which also required
denormalized: true
), either way
s
Removing overwrite writes duplicate data on the target table. Do you know if the primary key is determined in the source or target for json schema?
a
Removing overwrite writes duplicate data on the target table
Use the
upsert: true
option, its in the README and is what I meant when I mention a merge strategy
determined in the source
Yeah it comes from the source and is separate from the jsonschema. You can see the schema message spec here
s
Are you suggesting using upsert: true and denormalized: true together? @alexander_butler
a
Yes 👍.
overwrite
will overwrite the table.
upsert
will upsert/merge into the table. So you want
upsert: true
based on what your asking. And in order to use upsert, you need
denormalized: true
Please read the README. Here is a link directly to the part that mentions upsert.
s
According to the readme, the schema should have a primary key. My schema doesnt have a primary key. In that case what would you suggest? Here is a snapshot of my target schema:
a
You need
denormalized: true
to unpack to
data
field into separate fields.
s
when I use denormalized: true, I come across this error. Not sure if anyone has come across it before.
google.protobuf.json_format.ParseError: Failed to parse properties field: Message type "net.proto2.python.public.target_bigquery.AnonymousProto_5c6c8be168c30eec158d0fb28557d0cfe1b309c6" has no field named "about_you" at "AnonymousProto_96af3397b77c4eb5e9beae2a9d5c7bcc1b9bf1d6.properties"
loaders: - name: target-bigquery variant: z3z1ma pip_url: git+https://github.com/z3z1ma/target-bigquery.git config: credentials_path: key.json project: project1 dataset: hubspot_data location: US batch_size: 300 denormalized: true
a
The way we enable storage_write with bigquery is tricky. We generate the protobuf dynamically from a jsonschema. But jsonschemas don't follow the same rules as protobuf. I recommend you add this
method: batch_job
to your config too, then it should all work fine. 🙂
Copy code
loaders:
  - name: target-bigquery
    variant: z3z1ma
    pip_url: git+<https://github.com/z3z1ma/target-bigquery.git>
    config:
      credentials_path: key.json
      project: project1
      dataset: hubspot_data
      location: US
      batch_size: 100000
      denormalized: true
      method: batch_job
      upsert: true
This should be fine and performant ☝️
s
This works. Thanks a lot 🙂 @alexander_butler