Hello, I have question related to the "stream_maps...
# troubleshooting
a
Hello, I have question related to the "stream_maps" functionality. Given: Stream in custom tap with the following schema (pay attention "title" is required):
Copy code
schema = th.PropertiesList(
        th.Property(
            "id",
            th.IntegerType,
            required=True,
            nullable=False,
            description="The post's ID",
        ),
        th.Property(
            "title",
            th.StringType,
            required=True,
            nullable=True,
            description="The post's title",
        ),
        th.Property(
            "created_at",
            th.DateTimeType,
            required=True,
            nullable=False,
            description="The post's creation date",
        ),
    ).to_dict()
The same stream is transformed in tap's configuration ("title" required property is removed):
Copy code
config:
    stream_maps:     
        posts:
          __alias__: posts_v2
          __filter__: record['id'] != 3
          id_hashed: md5(str(record['id']))
          author: f'{fake.first_name()} {fake.last_name()}'
          title: __NULL__
          title_new: "' '.join([c.upper() for c in record['title'].replace(' ', '')])"
          year: int(datetime.datetime.strptime(record['created_at'], '%Y-%m-%dT%H:%M:%SZ').year)
          month: int(datetime.datetime.strptime(record['created_at'], '%Y-%m-%dT%H:%M:%SZ').month)
          day: int(datetime.datetime.strptime(record['created_at'], '%Y-%m-%dT%H:%M:%SZ').day)
As result I see the following SCHEMA object generated:
Copy code
{
  "type": "SCHEMA",
  "stream": "posts_v2",
  "schema": {
    "properties": {
      "id": {
        "description": "The post's ID",
        "type": [
          "integer"
        ]
      },
      "created_at": {
        "description": "The post's creation date",
        "format": "date-time",
        "type": [
          "string"
        ]
      },
      "id_hashed": {
        "type": [
          "string",
          "null"
        ]
      },
      "author": {
        "type": [
          "string",
          "null"
        ]
      },
      "title_new": {
        "type": [
          "string",
          "null"
        ]
      },
      "year": {
        "type": [
          "integer",
          "null"
        ]
      },
      "month": {
        "type": [
          "integer",
          "null"
        ]
      },
      "day": {
        "type": [
          "integer",
          "null"
        ]
      }
    },
    "type": "object",
    "required": [
      "id",
      "title",
      "created_at"
    ]
  },
  "key_properties": [
    "id"
  ],
  "bookmark_properties": [
    "id"
  ]
}
The problem is that new schema contains "title" in required properties list and it is causing issues in target, so I have to disable validation in target. Question: Is there any way to update required properties list in Schema object after applying transformation in stream_maps?
v
You can override the schema to make title not be requied
a
If you mean updating schema = th.PropertiesList then it will be some sort of hack. I want to enforce schema for the source and on the other hand would like to apply these transformations.
My expectations was that schema should be updated correspondingly after stream_maps
v
I'm not exactly sure how stream maps should/shouldn't handle this
a
Let me check if it works, thanks!
It doesn't work:
Copy code
- name: tap-...
    inherit_from: ...
    schema:
      posts_v2:
        id:
          type: ["integer"]
        id_hashed:
          type: ["string"]
        title_new:
          type: ["string"]
        author:
          type: ["string"]
        created_at:
          type: ["string"]
          format: date-time
Not sure it is possible to define schema for stream created using alias
h
could you try setting required to false
Copy code
schema:
      posts_v2:
        title:
          type: ["string", "null"]
          required: false
a
The problem is that I don't have "title" in new schema 🙂 it was removed by stream_maps
Let me try to add it without type spec
Nothing have changed:
Copy code
{
  "type": "SCHEMA",
  "stream": "posts_v2",
  "schema": {
    "properties": {
      "id": {
        "description": "The post's ID",
        "type": [
          "integer"
        ]
      },
      "created_at": {
        "description": "The post's creation date",
        "format": "date-time",
        "type": [
          "string"
        ]
      },
      "id_hashed": {
        "type": [
          "string",
          "null"
        ]
      },
      "author": {
        "type": [
          "string",
          "null"
        ]
      },
      "title_new": {
        "type": [
          "string",
          "null"
        ]
      }
    },
    "type": "object",
    "required": [
      "id",
      "title",
      "created_at"
    ]
  },
  "key_properties": [
    "id"
  ],
  "bookmark_properties": [
    "id"
  ]
}
h
so, the
schema
extra allows you to override the schema that's obtained through discovery. in addition to setting
required
to
false
for
type
, could you try deselect it explicitly, as in
Copy code
select:
  - !posts_v2.type
another
another thing to try is to set the fields' inclusion to available, it may currently be automatic, in which case it's selected by default.
a
"posts_v2" doesn't exist in source system, it is a new stream created in stream_maps from source stream called "posts"
image.png
h
ah, in that case, the selection rule aught to be
!posts.type
. Its a bit hard to follow what could be going on. Would it be possible to share the meltano.yml file with sensitive bits redacted?
e
So, using
required=True
results in the property being added to the
required
array of the object in the JSON schema:
Copy code
{"properties": {"id": {"type": "integer"}, ...}, "required": ["id"]}
And we currently don't remove the key from that array when it's popped by the stream map.
🙌 1
I'll put up an issue