sweta_garai
07/24/2023, 5:02 PMalexander_butler
07/24/2023, 5:06 PMoverwrite
== drop and replace tablesweta_garai
07/24/2023, 5:06 PMalexander_butler
07/24/2023, 5:07 PMappend
-only by defaultalexander_butler
07/24/2023, 5:07 PMalexander_butler
07/24/2023, 5:08 PMdenormalized: true
), either waysweta_garai
07/24/2023, 5:12 PMalexander_butler
07/25/2023, 8:36 PMRemoving overwrite writes duplicate data on the target tableUse the
upsert: true
option, its in the README and is what I meant when I mention a merge strategy
determined in the sourceYeah it comes from the source and is separate from the jsonschema. You can see the schema message spec here
sweta_garai
07/31/2023, 6:03 PMalexander_butler
07/31/2023, 6:43 PMoverwrite
will overwrite the table. upsert
will upsert/merge into the table.
So you want upsert: true
based on what your asking. And in order to use upsert, you need denormalized: true
Please read the README. Here is a link directly to the part that mentions upsert.sweta_garai
07/31/2023, 7:51 PMalexander_butler
07/31/2023, 7:52 PMdenormalized: true
to unpack to data
field into separate fields.sweta_garai
07/31/2023, 8:49 PMgoogle.protobuf.json_format.ParseError: Failed to parse properties field: Message type "net.proto2.python.public.target_bigquery.AnonymousProto_5c6c8be168c30eec158d0fb28557d0cfe1b309c6" has no field named "about_you" at "AnonymousProto_96af3397b77c4eb5e9beae2a9d5c7bcc1b9bf1d6.properties"
loaders:
- name: target-bigquery
variant: z3z1ma
pip_url: git+https://github.com/z3z1ma/target-bigquery.git
config:
credentials_path: key.json
project: project1
dataset: hubspot_data
location: US
batch_size: 300
denormalized: truealexander_butler
07/31/2023, 9:01 PMmethod: batch_job
to your config too, then it should all work fine. 🙂alexander_butler
07/31/2023, 9:02 PMloaders:
- name: target-bigquery
variant: z3z1ma
pip_url: git+<https://github.com/z3z1ma/target-bigquery.git>
config:
credentials_path: key.json
project: project1
dataset: hubspot_data
location: US
batch_size: 100000
denormalized: true
method: batch_job
upsert: true
This should be fine and performant ☝️sweta_garai
07/31/2023, 11:42 PM