Siddu Hussain
06/26/2024, 6:24 AM{
"$schema": "<http://json-schema.org/draft-07/schema#>",
"title": "Generated schema for Root",
"type": ["object", "null"],
"properties": {
"LogFileName": {
"type": ["string", "null"]
},
"LastModified": {
"type": ["string", "null"]
},
"insertId": {
"type": ["string", "null"]
},
"timestamp": {
"type": ["string", "null"]
},
"jsonPayload": {
"type": ["object", "null"],
"properties": {
"ENTITY 1": {},
"ENTITY 2": {},
"ENTITY 3": {},
"ENTITY 4": {},
"ENTITY 5": {},
"ENTITY 6": {},
"ENTITY 7": {},
// This Keeps adding and response will have one of the Entity not all
"ENTITY N": {}
}
}
}
}
I am also flattening the schema to reduce the snowflake compute cost for the transformation, as it is huge in volume.Edgar Ramírez (Arch.dev)
06/26/2024, 10:19 AMadditionalProperties
in jsonPayload
? That way you wouldn't need to enumerate the properties and lose data when new ones are added.Siddu Hussain
06/26/2024, 1:18 PMSiddu Hussain
06/26/2024, 1:19 PMEdgar Ramírez (Arch.dev)
06/26/2024, 2:43 PMVARCHAR
Siddu Hussain
06/27/2024, 8:18 PMadditionalProperties
not work in debug mode of tap, I see a column ignored message even after adding the additionalProperties
true
Properties ('has_meeting_summary',) were present in the 'meetingmetrics' stream but not found in catalog schema. Ignoring.Matt Menzenski
06/28/2024, 12:48 AMschema:
stream_name:
column_name:
type:
- object
Siddu Hussain
06/28/2024, 3:00 AMMatt Menzenski
06/28/2024, 11:00 AMSiddu Hussain
06/28/2024, 11:22 AMProperties ('has_meeting_summary',) was present in the 'meeting metrics' stream but not found in the catalog schema. Ignoring.
is emitted even after additionalProperties is added to the schemaSiddu Hussain
06/28/2024, 11:24 AMEdgar Ramírez (Arch.dev)
06/28/2024, 1:00 PMSiddu Hussain
06/28/2024, 1:02 PMSiddu Hussain
06/28/2024, 1:15 PMadditionalProperties
is set.
https://github.com/meltano/sdk/blob/4674b3f3ddbfeeb9a38588f52b8ac72eb80c61c3/singer_sdk/helpers/_typing.py#L434Edgar Ramírez (Arch.dev)
06/28/2024, 5:10 PMSiddu Hussain
06/28/2024, 5:59 PMSiddu Hussain
07/05/2024, 11:38 AMSiddu Hussain
07/05/2024, 12:10 PMadditionalProperties
to true
, the data is not skipped, but the schema is not updated. This results in the output appearing as it does. Snowflake Sink database setups are triggered at the beginning, but the schema changes in the middle of the data extraction.
Now, when building the merge query, it includes the columns from the initial Schema and disregards any newly added columns in the actual record.
Here is the fix for this https://github.com/meltano/sdk/compare/main...SidduHussain:sdk:additionalProperties-Schema-fix
I would like to know if schema update is expected to be automatic without adding schema changes here.
If this change is needed. I have a query.
-- If schema flattening is done at the tap, Can we emit multiple schemas for the same stream, and how will the target honor the new schema emitted.
-- If the schema is flattened at the target will other targets accept this graciously I have tested it on Snowflake alone and I can make it work with a little bit of tweak in the sink
by calling prepare_table
every time batch is loadedEdgar Ramírez (Arch.dev)
07/05/2024, 6:24 PM{"type": ["string", "null"]}
by default, you could inform that to your own schema with "additionalProperties": {"type": ["string", "null"]}
. But the problem with snowflake not recognizing that and thus not creating the appropriate columns persists.
Does using https://docs.meltano.com/concepts/plugins/#schema-extra with a wildcard not work for your use case?Siddu Hussain
07/05/2024, 6:55 PMEdgar Ramírez (Arch.dev)
07/05/2024, 7:32 PMif target doesn’t know the schema how does it add new columns is there some target which does this nowI don't something like that has been implemented so far