Does schema flattening not work with airbyte taps I m trying Meltano #troubleshooting

Does schema flattening not work with airbyte taps?...

steven_wang

07/18/2024, 4:51 PM

Does schema flattening not work with airbyte taps? I'm trying to flatten some objects extracted using airbyte's s3 tap. Here is my config:

Copy code

- name: tap-s3
    variant: airbyte
    capabilities:
      - state
      - stream-maps
    config:
      flattening_enabled: true 
      flattening_max_depth: 2

I'm not seeing objects flattened in the target which is duckdb. I have

data_flattening_max_level

in the duckdb target set to 2 as well:

Copy code

- name: target-duckdb
    variant: jwills
    config:
      add_metadata_columns: true
      batch_size_rows: 50000
      data_flattening_max_level: 2
      path: test.duckdb

✅ 1

Edgar Ramírez (Arch.dev)

07/18/2024, 9:14 PM

It should at least work on the target side with

data_flattening_max_level

🤔. Do you define the schema manually for tap-s3?

steven_wang

07/18/2024, 10:01 PM

no I'm not defining the schema for s3, letting discover/catalog discover it

steven_wang

07/18/2024, 10:04 PM

what's interesting is I tried it on a non airbyte tap, (the hubspot tap) and enabling flattening drops the json fields entirely. Like they don't show up in the destination at all. With the airbyte s3 tap they would show up at least, just not flattened. Here is my config for hubspot tap:

Copy code

- name: tap-hubspot
    variant: spacecowboy
    capabilities:
    - state
    - catalog
    - discover
    - about
    - stream-maps
    config:
      flattening_enabled: true
      flattening_max_depth: 1

Edgar Ramírez (Arch.dev)

07/19/2024, 4:49 AM

flattening drops the json fields entirely

what's the json type of the field? I'll try to come up with a reproducer for this, if it turns out to be a 🐛

steven_wang

07/19/2024, 4:02 PM

I'm seeing the schema outputted as this when printing to stdout without flattening enabled:

Copy code

{'properties': {}, 'type': ['object', 'null']}

I'll try to find a better example later today too

steven_wang

07/19/2024, 4:04 PM

I also wonder if it's something with the targets. With duckdb target it drops the field, while in snowflake I'm getting a key error with flattening:

Copy code

KeyError: 'properties__about_us'

Probably because some of the kv pairs aren't present in all the records

steven_wang

07/30/2024, 5:47 PM

Still poking around at this but I think it's because during catalog discovery for the s3 tap, the discovered catalog shows just this for nested json fields (note the nested field is called

properties

Copy code

'properties': {'type': ['null', 'object']}

When it should look something like this:

Copy code

{'properties': {<kv_pairs>}, 'type': ['object', 'null']}

For hubspot I was looking at the tap code and the catalog is specified in the tap but doesn't breakdown all the k-v pairs in the nested fields so the schema doesn't know what to flatten: https://github.com/YouCruit/tap-hubspot/blob/148ac26fbbd43a3aa68c6c8f1fbad34828763df0/tap_hubspot/streams/companies.py#L25 I'm thinking if I can feed in a custom catalog file with all the fields specified in it, then flattening might work

Edgar Ramírez (Arch.dev)

07/30/2024, 6:35 PM

https://docs.meltano.com/concepts/plugins/#catalog-extra might be the right thing for you

👀 1

steven_wang

07/30/2024, 7:37 PM

Yep that worked! Passing in a catalog with all fields specified in it results in the flattened fields in the target

🙌 1

Edgar Ramírez (Arch.dev)

07/30/2024, 7:38 PM

Awesome!

17 Views

Open in Slack

Previous Next