Hi All I have a vendor who keeps adding things to the data p Meltano #singer-tap-development

Hi All, I have a vendor who keeps adding things to...

Siddu Hussain

06/26/2024, 6:24 AM

Hi All, I have a vendor who keeps adding things to the data points to the response I was using JSON Scheme for this but I am learning I need to update this often due to changes in the product and I will be missing backfill data until I make changes. I would like to know if there is a way to achieve this. Schema looks like this

Copy code

{
  "$schema": "<http://json-schema.org/draft-07/schema#>",
  "title": "Generated schema for Root",
  "type": ["object", "null"],
  "properties": {
    "LogFileName": {
      "type": ["string", "null"]
    },
    "LastModified": {
      "type": ["string", "null"]
    },
    "insertId": {
      "type": ["string", "null"]
    },
    "timestamp": {
      "type": ["string", "null"]
    },
    "jsonPayload": {
      "type": ["object", "null"],
      "properties": {
        "ENTITY 1": {},
        "ENTITY 2": {},
        "ENTITY 3": {},
        "ENTITY 4": {},
        "ENTITY 5": {},
        "ENTITY 6": {},
        "ENTITY 7": {},
        // This Keeps adding and response will have one of the Entity not all
        "ENTITY N": {}
      }
    }
  }
}

I am also flattening the schema to reduce the snowflake compute cost for the transformation, as it is huge in volume.

Edgar Ramírez (Arch.dev)

06/26/2024, 10:19 AM

Have you tried using

additionalProperties

jsonPayload

? That way you wouldn't need to enumerate the properties and lose data when new ones are added.

Siddu Hussain

06/26/2024, 1:18 PM

I have and it works well with it, One more issue is It is saved as varchar in snowflake no matter what I call it object , give format json with string type

Siddu Hussain

06/26/2024, 1:19 PM

do you know how to save it as variant than varchar

Edgar Ramírez (Arch.dev)

06/26/2024, 2:43 PM

I'm AFK at the moment but I'd probably have to look at how target-snowflake decides to map a type to

VARCHAR

Siddu Hussain

06/27/2024, 8:18 PM

@Edgar Ramírez (Arch.dev) will

additionalProperties

not work in debug mode of tap, I see a column ignored message even after adding the

additionalProperties

true Properties ('has_meeting_summary',) were present in the 'meetingmetrics' stream but not found in catalog schema. Ignoring.

Matt Menzenski

06/28/2024, 12:48 AM

To save it as variant you might need to override the schema property for that column in meltano.yml:

Copy code

schema:
        stream_name:
          column_name:
            type:
              - object

Siddu Hussain

06/28/2024, 3:00 AM

Thanks for input @Matt Menzenski I will try this , do you mean you need to set this even after the column type is object in schema

Matt Menzenski

06/28/2024, 11:00 AM

“In the schema” where? If you are setting the schema explicitly to “object” in the tap it may not be necessary.

Siddu Hussain

06/28/2024, 11:22 AM

ok, I figured it out on additionalProperties it looks like the message,

Properties ('has_meeting_summary',) was present in the 'meeting metrics' stream but not found in the catalog schema. Ignoring.

is emitted even after additionalProperties is added to the schema

Siddu Hussain

06/28/2024, 11:24 AM

I have added an object type in json schema file but somehow snowflake still feels it's a string or it flattens the object. I will test more and update

Edgar Ramírez (Arch.dev)

06/28/2024, 1:00 PM

The SDK v0.37.0+ should not drop fields from `object`s marked with `additionalProperties`: https://github.com/meltano/sdk/pull/2301

Siddu Hussain

06/28/2024, 1:02 PM

yep saw this today, need to test more and see what is causing this.

Siddu Hussain

06/28/2024, 1:15 PM

on the tap side, I can confirm the message is more incorrect as the data is added to the record emitted but the message is emitted as the property is marked as unmapped_properties at code and not skipped if

additionalProperties

is set. https://github.com/meltano/sdk/blob/4674b3f3ddbfeeb9a38588f52b8ac72eb80c61c3/singer_sdk/helpers/_typing.py#L434

Edgar Ramírez (Arch.dev)

06/28/2024, 5:10 PM

Ah ok. So the field isn't actually dropped and the warning is inaccurate.

Siddu Hussain

06/28/2024, 5:59 PM

Yes, I have to validate why target is skipping it as it is not sent to the DB

👍 1

Siddu Hussain

07/05/2024, 11:38 AM

Singer Tap Emit

Siddu Hussain

07/05/2024, 12:10 PM

@Edgar Ramírez (Arch.dev), I think this is the reason for the failure. When you set

additionalProperties

true

, the data is not skipped, but the schema is not updated. This results in the output appearing as it does. Snowflake Sink database setups are triggered at the beginning, but the schema changes in the middle of the data extraction. Now, when building the merge query, it includes the columns from the initial Schema and disregards any newly added columns in the actual record. Here is the fix for this https://github.com/meltano/sdk/compare/main...SidduHussain:sdk:additionalProperties-Schema-fix I would like to know if schema update is expected to be automatic without adding schema changes here. If this change is needed. I have a query. -- If schema flattening is done at the tap, Can we emit multiple schemas for the same stream, and how will the target honor the new schema emitted. -- If the schema is flattened at the target will other targets accept this graciously I have tested it on Snowflake alone and I can make it work with a little bit of tweak in the

sink

by calling

prepare_table

every time batch is loaded

Edgar Ramírez (Arch.dev)

07/05/2024, 6:24 PM

I think rather than setting any additional prop to

{"type": ["string", "null"]}

by default, you could inform that to your own schema with

"additionalProperties": {"type": ["string", "null"]}

. But the problem with snowflake not recognizing that and thus not creating the appropriate columns persists. Does using https://docs.meltano.com/concepts/plugins/#schema-extra with a wildcard not work for your use case?

Siddu Hussain

07/05/2024, 6:55 PM

I need to check the wildcard part , I am just curious if target doesn’t know the schema how does it add new columns is there some target which does this now . I can take a look and see if snowflake target can adapt it.

Edgar Ramírez (Arch.dev)

07/05/2024, 7:32 PM

if target doesn’t know the schema how does it add new columns is there some target which does this now

I don't something like that has been implemented so far

38 Views

Open in Slack

Previous Next