I was wondering if anyone happens to know a way to...
# singer-tap-development
b
I was wondering if anyone happens to know a way to extend/overload the
STANDARD_KEYS
list and
Schema
data class within a tap/loader
tap.py
or
client.py
? I want to use the JSONSchema keywords
contentMediaType
and
contentEncoding
that were introduced in Draft 7 but noticed they don't show up in
schema messages
until they are added to the
STANDARD_KEYS
list and
Schema
data class found in
singer-sdk\_singerlib\schema.py
. I put some links to more info in a reply to this question.
e
Hey Dan! Yeah, the way that class is constructed it’ll never catch up with all the intricacies of the json schema specs.I wonder if it should at least be able to deserialize arbitrary keys into something like an
_extra
field.
b
Hey Edger! Thanks for the response. 🙏 Sorry for the delayed reply. 🙇 I think I am following. Please let me know if I get the concept correct. The
from_dict
method in the
Schema
class is the serializer the deserializer is the
to_dict
method. Any passed keys not present in
STANDARD_KEYS
could be added to
_extra
which would be a dictionary. The tap
schema message
might look like this
"TestColumn1": {"type": ["string", "null"], _extra: {"contentMediaType": "application/xml"}}}
. In the
to_sql_type
method on the target side I would look for it with something like this
if jsonschema_type.get('_extra', {}).get("contentMediaType") == "application/xml":
.
e
Nope, the emitted schema would look as expected with all the original keys in place. The deserializer would pass all keys in
_extra
to the schema, so you’d see
Copy code
"TestColumn1": {"type": ["string", "null"], "contentMediaType": "application/xml"}
in the target
b
Nice, good thing I asked about that. I was on the wrong page. Out of curiosity were you thinking of adding new methods to handle serializing and deserializing or just updating the
to_dict
and
from_dict
?
I was playing around today and added
"_extra"
to the
STANDARD_KEYS
and
_extra: dict | None = None
to the
Schema
class variables. Then changed to the following in
from_dict
and I got what I mentioned earlier
"TestColumn1": {"type": ["string", "null"], _extra: {"contentMediaType": "application/xml"}}}
Copy code
for key in data.keys():
            if key in STANDARD_KEYS:
                kwargs[key] = data[key]
            elif key not in kwargs:
                if kwargs.get("_extra"):
                    kwargs["_extra"][key] = data[key]
                else:
                    kwargs["_extra"] = {}
                    kwargs["_extra"][key] = data[key]
                    
        return cls(**kwargs)