Alexander Shabunevich
09/11/2024, 1:41 PMschema = th.PropertiesList(
th.Property(
"id",
th.IntegerType,
required=True,
nullable=False,
description="The post's ID",
),
th.Property(
"title",
th.StringType,
required=True,
nullable=True,
description="The post's title",
),
th.Property(
"created_at",
th.DateTimeType,
required=True,
nullable=False,
description="The post's creation date",
),
).to_dict()
The same stream is transformed in tap's configuration ("title" required property is removed):
config:
stream_maps:
posts:
__alias__: posts_v2
__filter__: record['id'] != 3
id_hashed: md5(str(record['id']))
author: f'{fake.first_name()} {fake.last_name()}'
title: __NULL__
title_new: "' '.join([c.upper() for c in record['title'].replace(' ', '')])"
year: int(datetime.datetime.strptime(record['created_at'], '%Y-%m-%dT%H:%M:%SZ').year)
month: int(datetime.datetime.strptime(record['created_at'], '%Y-%m-%dT%H:%M:%SZ').month)
day: int(datetime.datetime.strptime(record['created_at'], '%Y-%m-%dT%H:%M:%SZ').day)
As result I see the following SCHEMA object generated:
{
"type": "SCHEMA",
"stream": "posts_v2",
"schema": {
"properties": {
"id": {
"description": "The post's ID",
"type": [
"integer"
]
},
"created_at": {
"description": "The post's creation date",
"format": "date-time",
"type": [
"string"
]
},
"id_hashed": {
"type": [
"string",
"null"
]
},
"author": {
"type": [
"string",
"null"
]
},
"title_new": {
"type": [
"string",
"null"
]
},
"year": {
"type": [
"integer",
"null"
]
},
"month": {
"type": [
"integer",
"null"
]
},
"day": {
"type": [
"integer",
"null"
]
}
},
"type": "object",
"required": [
"id",
"title",
"created_at"
]
},
"key_properties": [
"id"
],
"bookmark_properties": [
"id"
]
}
The problem is that new schema contains "title" in required properties list and it is causing issues in target, so I have to disable validation in target.
Question: Is there any way to update required properties list in Schema object after applying transformation in stream_maps?visch
09/11/2024, 1:47 PMAlexander Shabunevich
09/11/2024, 1:50 PMAlexander Shabunevich
09/11/2024, 1:50 PMvisch
09/11/2024, 1:53 PMvisch
09/11/2024, 1:54 PMAlexander Shabunevich
09/11/2024, 1:57 PMAlexander Shabunevich
09/11/2024, 2:12 PM- name: tap-...
inherit_from: ...
schema:
posts_v2:
id:
type: ["integer"]
id_hashed:
type: ["string"]
title_new:
type: ["string"]
author:
type: ["string"]
created_at:
type: ["string"]
format: date-time
Not sure it is possible to define schema for stream created using aliashaleemur_ali
09/11/2024, 2:23 PMschema:
posts_v2:
title:
type: ["string", "null"]
required: false
Alexander Shabunevich
09/11/2024, 2:30 PMAlexander Shabunevich
09/11/2024, 2:31 PMAlexander Shabunevich
09/11/2024, 2:33 PM{
"type": "SCHEMA",
"stream": "posts_v2",
"schema": {
"properties": {
"id": {
"description": "The post's ID",
"type": [
"integer"
]
},
"created_at": {
"description": "The post's creation date",
"format": "date-time",
"type": [
"string"
]
},
"id_hashed": {
"type": [
"string",
"null"
]
},
"author": {
"type": [
"string",
"null"
]
},
"title_new": {
"type": [
"string",
"null"
]
}
},
"type": "object",
"required": [
"id",
"title",
"created_at"
]
},
"key_properties": [
"id"
],
"bookmark_properties": [
"id"
]
}
haleemur_ali
09/11/2024, 2:40 PMschema
extra allows you to override the schema that's obtained through discovery.
in addition to setting required
to false
for type
, could you try deselect it explicitly, as in
select:
- !posts_v2.type
anotherhaleemur_ali
09/11/2024, 2:42 PMAlexander Shabunevich
09/11/2024, 2:50 PMAlexander Shabunevich
09/11/2024, 2:50 PMhaleemur_ali
09/11/2024, 2:51 PM!posts.type
. Its a bit hard to follow what could be going on. Would it be possible to share the meltano.yml file with sensitive bits redacted?Edgar Ramírez (Arch.dev)
09/11/2024, 3:20 PMrequired=True
results in the property being added to the required
array of the object in the JSON schema:
{"properties": {"id": {"type": "integer"}, ...}, "required": ["id"]}
And we currently don't remove the key from that array when it's popped by the stream map.Edgar Ramírez (Arch.dev)
09/11/2024, 3:20 PMEdgar Ramírez (Arch.dev)
09/11/2024, 3:34 PM