neil_gorman
08/04/2023, 3:58 PMbatch_job
I notice that we are getting certain 400 errors with the google API, and a can't set attribute
error (1st ss). However, the pipeline will not exit and will continue to pull data, but no data is loaded and the program never exits.
When trying to load the data with the storage_write_api
method, we instead get a loader failed error right away, specifically with a Failed to parse CustomFields field: expected string or bytes-like object
error, and the pipeline exits. (full trace in 2nd ss).
Given the errors we are getting I believe it may be an issue with data types but haven’t found a resolution. Has anyone had a similar error to this?
cc: @alexander_butleralexander_butler
08/04/2023, 4:07 PMneil_gorman
08/04/2023, 4:37 PMalexander_butler
08/04/2023, 4:46 PMalexander_butler
08/04/2023, 4:48 PMneil_gorman
08/04/2023, 5:29 PMneil_gorman
08/04/2023, 5:38 PMneil_gorman
08/04/2023, 6:29 PMalexander_butler
08/04/2023, 6:34 PMbatch_job
should be set along with a batch_size: 100000
or more, depends on the capacity of the node you run on. Its compressed on the fly. It says it in the readme.
We should have better defaults I think, basically method: batch_job and batch_size 100k and denormalized: false since it should be fairly infallible.alexander_butler
08/04/2023, 6:34 PMalexander_butler
08/04/2023, 6:35 PMneil_gorman
08/04/2023, 6:48 PMneil_gorman
08/04/2023, 6:48 PMalexander_butler
08/04/2023, 6:54 PMneil_gorman
08/07/2023, 1:56 PM'Compressor' object has no attribute '_gzip'
error as before. This appears to be an issue when the temp tables for these streams are merging into the main table.
For each of these streams, we are loading data successfully into the temp tables, so I believe it is an error on the loader side. We have tested up to a batch size of 1 million and setting denormalized to true
and false
, but in every case the same attribute error arises.
Do you have any insight into what could be causing an error when merging data from the temp table into the main table?neil_gorman
08/08/2023, 5:15 PMalexander_butler
08/08/2023, 5:21 PM__del__
method from the compressor in core.py
? Its all I can think of. There should never be a case where ._gzip
is not set. We have an if-else branch in the init
that guarantees its set, one branch or the other should run. Also I never saw this issue in all my use. Let me know if that helps?