Hey all getting a few loader errors when trying to load Camp Meltano #troubleshooting

Hey all, getting a few loader errors when trying t...

neil_gorman

08/04/2023, 3:58 PM

Hey all, getting a few loader errors when trying to load Campaign Monitor data using target-bigquery: The target will hang once trying to load data from the tap and wont exit. When setting the loader method to

batch_job

I notice that we are getting certain 400 errors with the google API, and a

can't set attribute

error (1st ss). However, the pipeline will not exit and will continue to pull data, but no data is loaded and the program never exits. When trying to load the data with the

storage_write_api

method, we instead get a loader failed error right away, specifically with a Failed to parse CustomFields field: expected string or bytes-like object
error, and the pipeline exits. (full trace in 2nd ss). Given the errors we are getting I believe it may be an issue with data types but haven’t found a resolution. Has anyone had a similar error to this? cc: @alexander_butler

alexander_butler

08/04/2023, 4:07 PM

Are you using the snake case option?

neil_gorman

08/04/2023, 4:37 PM

Not currently, should we?

alexander_butler

08/04/2023, 4:46 PM

No. That was just related to schema evolution error id heard. If you want to go with the most bulletproof load method first, my recommendation is batch job and denormalized false. Storage write should ship with a disclaimer that I would only recommend it for simple jsonschemas because the runtime translation to protobuf is not straightforward. So for simple data like salesforce or flat structures like dbs it’s fine. Even then I find batch job can outperform surprisingly.

alexander_butler

08/04/2023, 4:48 PM

The jobs attempts thing should have already been fixed. If your certain you are running on main I can give it a look. Maybe it was reverted?

neil_gorman

08/04/2023, 5:29 PM

I set denormalized to false using batch_job and that solved the 400 issue Will let you know if we are still facing the hanging issue once data finishes loading Thank you for the help!

neil_gorman

08/04/2023, 5:38 PM

Noticed a new issue, after loading data for a few minutes we get the following error: OSError: [Errno 24] Too many open files The extractor appears to make a few more API calls before stalling out and no more data is loaded. Have you seen this error before?

neil_gorman

08/04/2023, 6:29 PM

Update on this: after stalling for 20ish minutes we get a loader error: AttributeError: ‘Compressor’ object has no attribute ’_gzip

alexander_butler

08/04/2023, 6:34 PM

Ah the batch size is probably very low isnt it?

batch_job

should be set along with a

batch_size: 100000

or more, depends on the capacity of the node you run on. Its compressed on the fly. It says it in the readme. We should have better defaults I think, basically method: batch_job and batch_size 100k and denormalized: false since it should be fairly infallible.

alexander_butler

08/04/2023, 6:34 PM

So theres less experimentation 🤷

alexander_butler

08/04/2023, 6:35 PM

Then again, anyone could PR it if they cared enough.

neil_gorman

08/04/2023, 6:48 PM

Perfect, that appears to have done the trick!

neil_gorman

08/04/2023, 6:48 PM

One other issue I found, looks like now with every pipeline run a new table is being created rather than loading into a previous table, any ideas on whats causing this?

alexander_butler

08/04/2023, 6:54 PM

Those are temp tables used to atomically update the target table. They have an expiration and will clear themselves automatically

neil_gorman

08/07/2023, 1:56 PM

Hey @alexander_butler, for certain streams, we are getting a similar

'Compressor' object has no attribute '_gzip'

error as before. This appears to be an issue when the temp tables for these streams are merging into the main table. For each of these streams, we are loading data successfully into the temp tables, so I believe it is an error on the loader side. We have tested up to a batch size of 1 million and setting denormalized to

true

and

false

, but in every case the same attribute error arises. Do you have any insight into what could be causing an error when merging data from the temp table into the main table?

neil_gorman

08/08/2023, 5:15 PM

Hey @alexander_butler sorry for all the pings, any ideas on this? This is only happening for a select few larger tables which makes me think its an issue with batch size but have yet to find a solution, have tried to increase the batch size and testing the other methods

alexander_butler

08/08/2023, 5:21 PM

Can you test removing the

__del__

method from the compressor in

core.py

? Its all I can think of. There should never be a case where

._gzip

is not set. We have an if-else branch in the

init

that guarantees its set, one branch or the other should run. Also I never saw this issue in all my use. Let me know if that helps?

Open in Slack

Previous Next