Hi, I've been trying to add default value to schem...
# troubleshooting
p
Hi, I've been trying to add default value to schema via sdk typing helper but during the
el
process the schema message omits this default property, while the default property is there when I just access stream object locally. can someone point me to the direction where this could happen?
this is how it is from tap:
Copy code
{
  "type": "SCHEMA",
  "stream": "users",
  "schema": {
    "properties": {
      "self": {
        "type": [
          "string",
          "null"
        ]
      },
      "accountId": {
        "type": [
          "string",
          "null"
        ]
      },
      "accountType": {
        "type": [
          "string",
          "null"
        ]
      },
      "emailAddress": {
        "type": [
          "string",
          "null"
        ]
      },
      "avatarUrls": {
        "properties": {
          "48x48": {
            "type": [
              "string",
              "null"
            ]
          },
          "24x24": {
            "type": [
              "string",
              "null"
            ]
          },
          "16x16": {
            "type": [
              "string",
              "null"
            ]
          },
          "32x32": {
            "type": [
              "string",
              "null"
            ]
          }
        },
        "type": "object"
      },
      "displayName": {
        "type": [
          "string",
          "null"
        ]
      },
      "active": {
        "type": [
          "boolean",
          "null"
        ]
      },
      "timeZone": {
        "type": [
          "string",
          "null"
        ]
      },
      "locale": {
        "type": [
          "string",
          "null"
        ]
      }
    },
    "type": "object",
    "required": [
      "avatarUrls"
    ]
  },
  "key_properties": [
    "accountId"
  ],
  "bookmark_properties": [
    "accountId"
  ]
}
definition:
Copy code
schema = PropertiesList(
        Property("self", StringType),
        Property("accountId", StringType),
        Property("accountType", StringType),
        Property("emailAddress", StringType),
        Property(
            "avatarUrls",
            ObjectType(
                Property("48x48", StringType),
                Property("24x24", StringType),
                Property("16x16", StringType),
                Property("32x32", StringType),
            ),
            default={},
            required=True
        ),
        Property("displayName", StringType),
        Property("active", BooleanType),
        Property("timeZone", StringType),
        Property("locale", StringType),
    ).to_dict()
if I access via ipython
message has been deleted
^ by disabling jira-stream inheritance
so there has to be something in the stream class which alters default description
v
What default are you trying to set? An empty dict? That isn't going to work, why not let that avatarurls property be required=false and then no default
p
I did requried just to see if that translates in the final schema message, which it does
I am trying this schema msg to create a parquet schema downstream, so it would be much easier if I can get default values as empty dict do that I can populate my parquet file
as right now it gives an error when api doesn't send avatarURL or top level property
in anycase, if jsonschema provides default option that can be populated by overriding the validator, I think the schema should emit default objs
v
Two separate issues you're taking about the target in regards to parquet. For the tap you should go with no required
For the target it should read the schema message to setup its schema it shouldn't be using records to setup its schema
p
the emitted schema from the tap doesn't have a default property, even when provided in schema. that's only issue
if the emitted schema has a default property, I can do schema manipulation (that's the required usecase)
v
That default isn't going to work, why do you need it so bad? Yes there's ways to get you what you're after but let's first start with what the goal is
What does that mean, the schema has the info it needs without the default
p
okay two step process, I think I mixed up, let me elaborate: schema generation : This step is done by the schema and is working. record generation: If the all the records in a streams (or a batch stream) are missing a property, the way the current target works, it reads the records and create a dictionary and then feeds in to the schema df. Here it fails.
I think I can get around by generating schema for the records df by the schema too
but I would like if the default property can trickle down to schema message so that I can use it for other usecases (like data sanity) too
is there a reason why, default property is omitted?
The default property is not passed at all whether empty dict or something else
e
Hi @prakhar_srivastava! I guess you're building a custom target that makes use of the default property? I think it's an easy fix: add
default
to the props list and dataclass here https://github.com/meltano/sdk/blob/8ecbfb695c72a1375061e919f0d3d24147f11dd2/singer_sdk/_singerlib/schema.py#L12
p
awesome! Yes exactly, while being at it, the reason I need default property is that in downstream target, I can use, this to get default data. which can be a time saver for schema-aware targets and if I want non-null values.
this ties back to my initial issue of handling default data. https://github.com/meltano/sdk/issues/1998
thanks for the help as always, @visch and @edgar_ramirez_mondragon
Also using the above method I can just override, _validator at target level so the issue doesn't make sense now, I'll close it
but a good food for thought is, should this be available out of box, or should be documented?
e
Thanks @prakhar_srivastava!