Does schema flattening not work with airbyte taps?...
# troubleshooting
s
Does schema flattening not work with airbyte taps? I'm trying to flatten some objects extracted using airbyte's s3 tap. Here is my config:
Copy code
- name: tap-s3
    variant: airbyte
    capabilities:
      - state
      - stream-maps
    config:
      flattening_enabled: true 
      flattening_max_depth: 2
I'm not seeing objects flattened in the target which is duckdb. I have
data_flattening_max_level
in the duckdb target set to 2 as well:
Copy code
- name: target-duckdb
    variant: jwills
    config:
      add_metadata_columns: true
      batch_size_rows: 50000
      data_flattening_max_level: 2
      path: test.duckdb
1
e
It should at least work on the target side with
data_flattening_max_level
🤔. Do you define the schema manually for tap-s3?
s
no I'm not defining the schema for s3, letting discover/catalog discover it
what's interesting is I tried it on a non airbyte tap, (the hubspot tap) and enabling flattening drops the json fields entirely. Like they don't show up in the destination at all. With the airbyte s3 tap they would show up at least, just not flattened. Here is my config for hubspot tap:
Copy code
- name: tap-hubspot
    variant: spacecowboy
    capabilities:
    - state
    - catalog
    - discover
    - about
    - stream-maps
    config:
      flattening_enabled: true
      flattening_max_depth: 1
e
flattening drops the json fields entirely
what's the json type of the field? I'll try to come up with a reproducer for this, if it turns out to be a 🐛
s
I'm seeing the schema outputted as this when printing to stdout without flattening enabled:
Copy code
{'properties': {}, 'type': ['object', 'null']}
I'll try to find a better example later today too
I also wonder if it's something with the targets. With duckdb target it drops the field, while in snowflake I'm getting a key error with flattening:
Copy code
KeyError: 'properties__about_us'
Probably because some of the kv pairs aren't present in all the records
Still poking around at this but I think it's because during catalog discovery for the s3 tap, the discovered catalog shows just this for nested json fields (note the nested field is called
properties
):
Copy code
'properties': {'type': ['null', 'object']}
When it should look something like this:
Copy code
{'properties': {<kv_pairs>}, 'type': ['object', 'null']}
For hubspot I was looking at the tap code and the catalog is specified in the tap but doesn't breakdown all the k-v pairs in the nested fields so the schema doesn't know what to flatten: https://github.com/YouCruit/tap-hubspot/blob/148ac26fbbd43a3aa68c6c8f1fbad34828763df0/tap_hubspot/streams/companies.py#L25 I'm thinking if I can feed in a custom catalog file with all the fields specified in it, then flattening might work
e
👀 1
s
Yep that worked! Passing in a catalog with all fields specified in it results in the flattened fields in the target
🙌 1
e
Awesome!