Another Q: Am I missing a setting for causing a s...
# troubleshooting
m
Another Q: Am I missing a setting for causing a stream to fail if the records do not match the schema? I have a schema defined with a required field and if I emit a record from
get_records
that is missing that field there is no error, not even a warning.
TYPE_CONFORMANCE_LEVEL
seems to only check for extra keys. But it doesn’t validate a message against the actual schema as far as I can tell in
_typing.py
. Is message -> schema validation left as an implementation detail for the implementors of a stream/tap?
v
I'm pretty sure if you don't have null in the list of types it fails if the key isn't there, but I haven't tested that fact in a few releases
m
I’m testing it atm and if my record does not have the required key it is still emitted by the tap. Given this
Copy code
schema = th.PropertiesList(
            th.Property("id", th.IntegerType, required=True),
            th.Property("name", th.StringType, required=True),
            th.Property("company", th.StringType),
            th.Property("offset", th.IntegerType),
            th.Property("test", th.StringType)
        ).to_dict()
I can successfully emit messages in
get_records
in a class that subclasses
Stream
that don’t have the
id
or
name
fields. E.g. ,
required
is not being enforced by anything in the SDK within
core.py
example output of invoking the tap
```
{“type”: “SCHEMA”, “stream”: “test_required”, “schema”: {“properties”: {“id”: {“type”: [“integer”]}, “name”: {“type”: [“string”]}, “company”: {“type”: [“string”, “null”]}, “offset”: {“type”: [“integer”, “null”]}, “test”: {“type”: [“string”, “null”]}}, “type”: “object”, “required”: [“id”, “name”]}, “key_properties”: [“id”], “bookmark_properties”: [“offset”]} {“type”: “RECORD”, “stream”: “test_required”, “record”: {“id”: “lkhjsdfs”, “company”: “bb”, “offset”: -99, “test”: “rerun”}, “time_extracted”: “2023-06-02T232558.969239+00:00”} {“type”: “STATE”, “value”: {“bookmarks”: {“test_required”: {“replication_key”: “offset”, “replication_key_value”: -99}}}}
the emitted record is definitely invalid according to the schema emitted. The type on
id
doesn’t match, and it is missing the
name
field.
Do you want me to file an issue about this?
e
Is message -> schema validation left as an implementation detail for the implementors of a stream/tap?
Not really. The SDK doesn't expect schema validation at runtime in the tap, but the testing framework recently got schema validation: https://github.com/meltano/sdk/pull/1711
v
Vaguely remembering how I deal with this now, I normally just run meltano run tap-name target-jsonl as the target will validate the json, I forgot it's not done at the tap. Probably an efficiency thing I think I have an issue in somewhere for enabling it via a config option. Hope that helps!
m
Yeah. Can totally see it being a config flag. Depends on your users and what not. I have hundreds, so validation is a must. I’ll go spelunking and see if I can tuck something behind a config option to turn on runtime validation (expensive)
v
https://github.com/meltano/sdk/issues/227 found the issue, probably not too helpful but who knows!
m
🙏 ty for the info.