thomas_schmidt
08/17/2021, 5:12 PMtarget-bigquery
with different taps. My specific example is the stripe-tap
(singer-io variant git+<https://github.com/singer-io/tap-stripe.git@v1.4.8>
)
When running TARGET_BIGQUERY_DATASET_ID=meltano_stripe meltano elt tap-stripe target-bigquery
it raises the following error
target-bigquery | CRITICAL 'RECORD'
target-bigquery | CRITICAL ['Traceback (most recent call last):\n', ' File "/Users/thomas/Agrando/agr-meltano/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/target_bigquery/__init__.py", line 103, in main\n for state in state_iterator:\n', ' File "/Users/thomas/Agrando/agr-meltano/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/target_bigquery/process.py", line 54, in process\n for s in handler.handle_record_message(msg):\n', ' File "/Users/thomas/Agrando/agr-meltano/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/target_bigquery/processhandler.py", line 177, in handle_record_message\n nr = format_record_to_schema(nr, self.bq_schema_dicts[stream])\n', ' File "/Users/thomas/Agrando/agr-meltano/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/target_bigquery/schema.py", line 389, in format_record_to_schema\n record[k] = conversion_dict[bq_schema[k]["type"]](v)\n', "KeyError: 'RECORD'\n"]
meltano | Loading failed (2): CRITICAL ['Traceback (most recent call last):\n', ' File "/Users/thomas/Agrando/agr-meltano/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/target_bigquery/__init__.py", line 103, in main\n for state in state_iterator:\n', ' File "/Users/thomas/Agrando/agr-meltano/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/target_bigquery/process.py", line 54, in process\n for s in handler.handle_record_message(msg):\n', ' File "/Users/thomas/Agrando/agr-meltano/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/target_bigquery/processhandler.py", line 177, in handle_record_message\n nr = format_record_to_schema(nr, self.bq_schema_dicts[stream])\n', ' File "/Users/thomas/Agrando/agr-meltano/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/target_bigquery/schema.py", line 389, in format_record_to_schema\n record[k] = conversion_dict[bq_schema[k]["type"]](v)\n', "KeyError: 'RECORD'\n"]
meltano | ELT could not be completed: Loader failed
ELT could not be completed: Loader failed
We figured out that this seems to have to do with how the schema is defined in the catalog: e.g. sometimes fields are type object
and have no properties defined.
We managed to patch this by parsing the catalog.json and replacing parts of the schema with a script.
However still some fields generate issues: For example when extracting only customer.*
with
sh extract/tap-stripe-patch/patch-tap-stripe-catalog.sh && TARGET_BIGQUERY_DATASET_ID=meltano_stripe meltano elt tap-stripe target-bigquery --catalog=extract/tap-stripe-patch/tap-stripe-catalog.json
We get the same error. We figured out that the customers.sources
field is one of the reasons. Now when we exclude it it seems still to cause those errors. When are those fields excluded? Shouldn’t the tap already take care of it?
Maybe you guys have some tips and tricks for us. It is currently a big blocker because from the taps we need, the only one we managed to get to run with BigQuery is the tap-gitlab
oneaaronsteers
08/17/2021, 6:33 PMWe figured out that this seems to have to do with how the schema is defined in the catalog: e.g. sometimes fields are typeThere's a discussion going on here regarding variant object types in the SDK. While technically speaking, JSON Schema allows variant objects (objects with no defined properties), I am not sure which targets (if any) support this - either at the top level of the stream (unlikely) or in nested nodes (more feasible but also unsure).and have no properties defined.object
aaronsteers
08/17/2021, 6:36 PMKeyError: 'RECORD'I could be wrong but this specific key error seems indicative of something else besides a missing property definition - I'd expect a key error on something like 'cust_id' or 'address' or something in the subnodes of the record. It looks like it is having a problem parsing the RECORD message itself - which could indicate a bug in the tap or just confusing logging in the target.
aaronsteers
08/17/2021, 6:37 PM--log-level=debug
flag by chance? This might produce additional hints.thomas_schmidt
08/18/2021, 4:56 AMtarget-bigquery
there is a line
record[k] = conversion_dict[bq_schema[k]["type"]](v)
which leads to the error. When I put a breakpoint there we have the following situation
k = 'sources'
bq_schema[k] = {'type': 'RECORD', 'mode': 'NULLABLE', 'fields': []}
bq_schema[k]["type"] = 'RECORD'
conversion_dict = {
'BYTES': <class 'bytes'>,
'STRING': <class 'str'>,
'TIME': <class 'str'>,
'TIMESTAMP': <class 'str'>,
'DATE': <class 'str'>,
'DATETIME': <class 'str'>,
'FLOAT': <class 'float'>,
'NUMERIC': <class 'float'>,
'BIGNUMERIC': <class 'float'>,
'INTEGER': <class 'int'>,
'BOOLEAN': <class 'bool'>,
'GEOGRAPHY': <class 'str'>,
'DECIMAL': <class 'str'>,
'BIGDECIMAL': <class 'str'>
}
So the conversion_dict
has no conversion for the type RECORD
.
In the function docstring it says
RECORD is not included into conversion_dict - it is done on purpose. RECORD is handled recursively.Looking into the catalog the
sources
field is defined as following
"sources": {
"anyOf": [
{
"type": [
"null",
"array"
],
"items": {
"type": [
"null",
"object"
],
"properties": {...}
}
},
{
"type": [
"null",
"object"
],
"properties": {...}
}
]
},
So it seems that somewhere in the conversion of the schema something is odd.
However I still wonder why unselecting the field does not have any effect. Shouldn’t the tap then also exclude this field from the schema and the record data? Sorry I am pretty new to the Singer Specthomas_schmidt
08/18/2021, 8:01 AMaaronsteers
08/18/2021, 1:53 PMaaronsteers
08/18/2021, 1:56 PMthomas_schmidt
08/18/2021, 2:34 PMtap-stripe
but it hasn’t been updated for quite a while. Do you by chance know what the best approach for a contribution would be here? I already have some schema filtering running locallythomas_schmidt
08/19/2021, 7:46 AMaaronsteers
08/19/2021, 4:22 PMaaronsteers
08/19/2021, 4:24 PMthomas_schmidt
08/19/2021, 5:50 PMedgar_ramirez_mondragon
08/19/2021, 6:28 PMprratek_ramchandani
08/19/2021, 6:32 PMthomas_schmidt
08/20/2021, 6:56 AMprratek_ramchandani
08/20/2021, 1:07 PMdaniel_luftspring
02/24/2022, 9:01 PMtap-hellobaton
is having with target-bigquery
The error message is identical to the one listed here from what i'm gathering the target doesn't handle nullable nested types very well. They aren't using a catalogue at all so i'm wondering if anyone here might be able to help talk me through the fix to tap-stripe
or at least point me in the right direction.prratek_ramchandani
02/24/2022, 9:41 PMobject
but don't specify types or the nested properties so you'd want to make sure any object
type fields has properties
where each property also specifies a type
daniel_luftspring
02/24/2022, 10:09 PMprratek_ramchandani
02/24/2022, 11:35 PMforce-fields
config option for target-bigquery to coerce the field to a string and then downstream use bigquery's json functions to parse out the nested fields you care aboutdaniel_luftspring
02/25/2022, 8:13 PMforce-fields
option works in practice. Thank you @prratek_ramchandani!