One more question regarding target bigquery adswerve I have Meltano #plugins-general

One more question regarding target-bigquery (adswe...

nil

12/30/2020, 6:48 PM

One more question regarding target-bigquery (adswerve) . I have set

Copy code

replication_method: truncate

but still i can see that the data in bigquery contains duplicated entries (rows) or multiple version of a row. Is my understanding correct that "truncate" will upsert/merge (no duplicate rows), where as "append" will insert data at the end of table (creating duplicate rows).

douwe_maan

12/30/2020, 8:55 PM

@nil target-bigquery reads the

replication_method

setting here: https://github.com/adswerve/target-bigquery/blob/master/target_bigquery/__init__.py#L45-L47 and uses it here: https://github.com/adswerve/target-bigquery/blob/master/target_bigquery/processhandler.py#L203-L208 to set choose tell BigQuery to use the

TRUNCATE

WriteDispositition

, which works as expected: https://cloud.google.com/bigquery/docs/reference/auditlogs/rest/Shared.Types/WriteDisposition. You should see

"Load {table_name} by FULL_TABLE (truncate)"

logged as well. However, when that

_load_to_bq

method is called from

_do_temp_table_based_load

, it actually only loads the data into a temporary table: https://github.com/adswerve/target-bigquery/blob/master/target_bigquery/processhandler.py#L136, and then copies it into the "real" table using `APPEND`: https://github.com/adswerve/target-bigquery/blob/master/target_bigquery/processhandler.py#L152 So this looks to me like it could be a bug, and I think this last call site should be using the same

truncate=self.truncate if stream not in self.partially_loaded_streams else False

logic used in the loop that calls

_load_to_bq

. If you make that change to the locally installed target-bigquery, does it show the expected truncating behavior? If it does indeed work that way, I suggest filing an issue and contributing directly to https://github.com/adswerve/target-bigquery!

nil

12/30/2020, 9:07 PM

I will check that out tomorrow, to see if that solves the problem. As always thank you for your help

nil

01/04/2021, 9:26 PM

ok i have run with the change you suggested, it does truncate 🙂 but does not do what we thought. so when copy_jobs runs it will actually truncate the table and add the new data. this means we loose all the old data from the BQ. so write_append in copy job will copy the temp_table to BQ table, where as write_truncate will overwrite the BQ table (with data from temp_table) So the change i made was (as we only wanted to test if truncate did what we thought in

write_disposition

)

Copy code

if self.truncate:
  copy_config.write_disposition = WriteDisposition.WRITE_TRUNCATE
else:
  copy_config.write_disposition = WriteDisposition.WRITE_APPEND

douwe_maan

01/04/2021, 9:56 PM

@nil All right, so this is indeed a bug? Will you file an issue/PR on https://github.com/adswerve/target-bigquery? 🙂 Instead of checking

self.truncate

, I think we should use

self.truncate and stream not in self.partially_loaded_streams

like it does in

_load_to_bq

, since partially loaded streams do need to append on copies after the first one, even if the first copy truncated.

nil

01/04/2021, 10:09 PM

how about

Copy code

if self.tables[stream] not in self.partially_loaded_streams and self.truncate 
  copy_config.write_disposition = WriteDisposition.WRITE_TRUNCATE
else 
  copy_config.write_disposition = WriteDisposition.WRITE_APPEND

so it will truncate only if its not in

partially_loaded_streams

and

self.truncate

true

append

otherwise.

douwe_maan

01/04/2021, 10:10 PM

Isn't that what I suggested? 😄 I think

partially_loaded_streams

contains stream names though (like

stream

), not

self.tables[stream]

douwe_maan

01/04/2021, 10:10 PM

Either way, this is probably something the authors of the tap will need to weigh in on so I suggest moving this discussion to their issue/PR tracker 😉

nil

01/04/2021, 10:12 PM

I am running it again with this change, so hopefully will add issue there soon 🙂

Open in Slack

Previous Next