Feel like I am monopolising the channel here! Real...
# singer-tap-development
a
Feel like I am monopolising the channel here! Really appreciate the responses. I have a field with the spec:
Copy code
"timestamp_signup": {
        "type": [
          "null",
          "string"
        ],
        "format": "date-time"
      },
But the API returns empty string for this field when null
"timestamp_signup": "",
Which results in:
Copy code
ValueError: Could not parse value '' for field 'timestamp_signup'
Copy code
2023-04-19T12:13:36.921311Z [info     ]     raise ParserError("String does not contain a date: %s", timestr) cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-04-19T12:13:36.922028Z [info     ] dateutil.parser._parser.ParserError: String does not contain a date: cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
This is only an issue when the object has the
format=date-time
designation. Do I have to special-case this myself? Implement
post_process()
and then coerce empty string to a proper python
None
? Or can meltano handle 'false-y' values like this, and I'm just missing the magic command?
m
You can probably add a stream map in between the tap and the target to convert that empty string to null. That’s one way I think you can do it.
a
Thanks, looks like a feature of this API so can occur unexpectedly in any date-time field. Just wondering if there was some kind of global way to coerce false-ish values to true
null
p
@Andy Carter what tap is this? This feels like a tap bug to me. Its returning records that dont match the schema that its providing for those records. If its possible to update the tap then it should convert the empty strings to nulls before emitting them or alternatively change the schema to not be of date-time format because not all records are date-times.
a
It's my variant of
tap-mailchimp
version using the sdk, I will check the hub variant to see how these fields are handled as I don't recall getting many issues with the hub variant, maybe there was some internal conversion going on. I used the schema.json files from there as a starting point.
p
Theres a post_process method https://sdk.meltano.com/en/latest/classes/singer_sdk.Stream.html#singer_sdk.Stream.post_process in the SDK that might be helpful for this type of clean up stuff
a
Yes I figured I'm going to need something like:
Copy code
def post_process(self, row: dict, context: dict | None = None) -> dict | None:
        """
        This API returns empty strings in place of nulls
        Need to convert these to true nulls to get correct datetime handling,
        otherwise errors from trying to generate datetime from "".
        """
        row = {
            k: None if v == "" else v
            for k,v
            in row.items()
        }
        return row
but wasn't sure at what point the object gets validated against the schema, if I just invoke the tap will it get validated? Or only as I'm writing to target? I THINK I didn't get the issue using
target-jsonl
from hub, only with
target-mssql
p
@Andy Carter I believe the built in test suite that the SDK provides will validate but if its infrequent then it might not catch it. Targets do the validation and I thought target-jsonl would error on this but I'm not positive. I've heard of people using that as a test case for validation.