Hi All, I have a vendor who keeps adding things to...
# singer-tap-development
s
Hi All, I have a vendor who keeps adding things to the data points to the response I was using JSON Scheme for this but I am learning I need to update this often due to changes in the product and I will be missing backfill data until I make changes. I would like to know if there is a way to achieve this. Schema looks like this
Copy code
{
  "$schema": "<http://json-schema.org/draft-07/schema#>",
  "title": "Generated schema for Root",
  "type": ["object", "null"],
  "properties": {
    "LogFileName": {
      "type": ["string", "null"]
    },
    "LastModified": {
      "type": ["string", "null"]
    },
    "insertId": {
      "type": ["string", "null"]
    },
    "timestamp": {
      "type": ["string", "null"]
    },
    "jsonPayload": {
      "type": ["object", "null"],
      "properties": {
        "ENTITY 1": {},
        "ENTITY 2": {},
        "ENTITY 3": {},
        "ENTITY 4": {},
        "ENTITY 5": {},
        "ENTITY 6": {},
        "ENTITY 7": {},
        // This Keeps adding and response will have one of the Entity not all
        "ENTITY N": {}
      }
    }
  }
}
I am also flattening the schema to reduce the snowflake compute cost for the transformation, as it is huge in volume.
e
Have you tried using
additionalProperties
in
jsonPayload
? That way you wouldn't need to enumerate the properties and lose data when new ones are added.
s
I have and it works well with it, One more issue is It is saved as varchar in snowflake no matter what I call it object , give format json with string type
do you know how to save it as variant than varchar
e
I'm AFK at the moment but I'd probably have to look at how target-snowflake decides to map a type to
VARCHAR
s
@Edgar Ramírez (Arch.dev) will
additionalProperties
not work in debug mode of tap, I see a column ignored message even after adding the
additionalProperties
true Properties ('has_meeting_summary',) were present in the 'meetingmetrics' stream but not found in catalog schema. Ignoring.
m
To save it as variant you might need to override the schema property for that column in meltano.yml:
Copy code
schema:
        stream_name:
          column_name:
            type:
              - object
s
Thanks for input @Matt Menzenski I will try this , do you mean you need to set this even after the column type is object in schema
m
“In the schema” where? If you are setting the schema explicitly to “object” in the tap it may not be necessary.
s
ok, I figured it out on additionalProperties it looks like the message,
Properties ('has_meeting_summary',) was present in the 'meeting metrics' stream but not found in the catalog schema. Ignoring.
is emitted even after additionalProperties is added to the schema
I have added an object type in json schema file but somehow snowflake still feels it's a string or it flattens the object. I will test more and update
e
The SDK v0.37.0+ should not drop fields from `object`s marked with `additionalProperties`: https://github.com/meltano/sdk/pull/2301
s
yep saw this today, need to test more and see what is causing this.
on the tap side, I can confirm the message is more incorrect as the data is added to the record emitted but the message is emitted as the property is marked as unmapped_properties at code and not skipped if
additionalProperties
is set. https://github.com/meltano/sdk/blob/4674b3f3ddbfeeb9a38588f52b8ac72eb80c61c3/singer_sdk/helpers/_typing.py#L434
e
Ah ok. So the field isn't actually dropped and the warning is inaccurate.
s
Yes, I have to validate why target is skipping it as it is not sent to the DB
👍 1
Singer Tap Emit
@Edgar Ramírez (Arch.dev), I think this is the reason for the failure. When you set
additionalProperties
to
true
, the data is not skipped, but the schema is not updated. This results in the output appearing as it does. Snowflake Sink database setups are triggered at the beginning, but the schema changes in the middle of the data extraction. Now, when building the merge query, it includes the columns from the initial Schema and disregards any newly added columns in the actual record. Here is the fix for this https://github.com/meltano/sdk/compare/main...SidduHussain:sdk:additionalProperties-Schema-fix I would like to know if schema update is expected to be automatic without adding schema changes here. If this change is needed. I have a query. -- If schema flattening is done at the tap, Can we emit multiple schemas for the same stream, and how will the target honor the new schema emitted. -- If the schema is flattened at the target will other targets accept this graciously I have tested it on Snowflake alone and I can make it work with a little bit of tweak in the
sink
by calling
prepare_table
every time batch is loaded
e
I think rather than setting any additional prop to
{"type": ["string", "null"]}
by default, you could inform that to your own schema with
"additionalProperties": {"type": ["string", "null"]}
. But the problem with snowflake not recognizing that and thus not creating the appropriate columns persists. Does using https://docs.meltano.com/concepts/plugins/#schema-extra with a wildcard not work for your use case?
s
I need to check the wildcard part , I am just curious if target doesn’t know the schema how does it add new columns is there some target which does this now . I can take a look and see if snowflake target can adapt it.
e
if target doesn’t know the schema how does it add new columns is there some target which does this now
I don't something like that has been implemented so far