Another Q Am I missing a setting for causing a stream to fai Meltano #troubleshooting

Another Q: Am I missing a setting for causing a s...

mathew_fournier

06/02/2023, 11:11 PM

Another Q: Am I missing a setting for causing a stream to fail if the records do not match the schema? I have a schema defined with a required field and if I emit a record from

get_records

that is missing that field there is no error, not even a warning.

TYPE_CONFORMANCE_LEVEL

seems to only check for extra keys. But it doesn’t validate a message against the actual schema as far as I can tell in

_typing.py

. Is message -> schema validation left as an implementation detail for the implementors of a stream/tap?

visch

06/02/2023, 11:22 PM

I'm pretty sure if you don't have null in the list of types it fails if the key isn't there, but I haven't tested that fact in a few releases

mathew_fournier

06/02/2023, 11:23 PM

I’m testing it atm and if my record does not have the required key it is still emitted by the tap. Given this

Copy code

schema = th.PropertiesList(
            th.Property("id", th.IntegerType, required=True),
            th.Property("name", th.StringType, required=True),
            th.Property("company", th.StringType),
            th.Property("offset", th.IntegerType),
            th.Property("test", th.StringType)
        ).to_dict()

mathew_fournier

06/02/2023, 11:24 PM

I can successfully emit messages in

get_records

in a class that subclasses

Stream

that don’t have the

id

name

fields. E.g. ,

required

is not being enforced by anything in the SDK within

core.py

mathew_fournier

06/02/2023, 11:26 PM

example output of invoking the tap

mathew_fournier

06/02/2023, 11:26 PM

```

mathew_fournier

06/02/2023, 11:26 PM

{“type”: “SCHEMA”, “stream”: “test_required”, “schema”: {“properties”: {“id”: {“type”: [“integer”]}, “name”: {“type”: [“string”]}, “company”: {“type”: [“string”, “null”]}, “offset”: {“type”: [“integer”, “null”]}, “test”: {“type”: [“string”, “null”]}}, “type”: “object”, “required”: [“id”, “name”]}, “key_properties”: [“id”], “bookmark_properties”: [“offset”]} {“type”: “RECORD”, “stream”: “test_required”, “record”: {“id”: “lkhjsdfs”, “company”: “bb”, “offset”: -99, “test”: “rerun”}, “time_extracted”: “2023-06-02T232558.969239+00:00”} {“type”: “STATE”, “value”: {“bookmarks”: {“test_required”: {“replication_key”: “offset”, “replication_key_value”: -99}}}}

mathew_fournier

06/02/2023, 11:27 PM

the emitted record is definitely invalid according to the schema emitted. The type on

id

doesn’t match, and it is missing the

name

field.

mathew_fournier

06/02/2023, 11:27 PM

Do you want me to file an issue about this?

edgar_ramirez_mondragon

06/03/2023, 12:15 PM

Is message -> schema validation left as an implementation detail for the implementors of a stream/tap?

Not really. The SDK doesn't expect schema validation at runtime in the tap, but the testing framework recently got schema validation: https://github.com/meltano/sdk/pull/1711

visch

06/03/2023, 1:35 PM

Vaguely remembering how I deal with this now, I normally just run meltano run tap-name target-jsonl as the target will validate the json, I forgot it's not done at the tap. Probably an efficiency thing I think I have an issue in somewhere for enabling it via a config option. Hope that helps!

mathew_fournier

06/05/2023, 4:06 AM

Yeah. Can totally see it being a config flag. Depends on your users and what not. I have hundreds, so validation is a must. I’ll go spelunking and see if I can tuck something behind a config option to turn on runtime validation (expensive)

visch

06/05/2023, 12:23 PM

https://github.com/meltano/sdk/issues/227 found the issue, probably not too helpful but who knows!

mathew_fournier

06/05/2023, 2:58 PM

🙏 ty for the info.

Open in Slack

Previous Next