Hey all, we are trying to enable Log Based Replica...
# singer-taps
h
Hey all, we are trying to enable Log Based Replication in tap-postgres from meltanolabs but we found an issue in what I believe is the format output of wal2json for array types. Here is the log output from tap-postgres for a stream with the property
from
being an array of strings:
Copy code
2024-08-25 17:25:09 2024-08-25T20:25:09.739025Z [info     ] {"type":"RECORD","stream":"public-messages","record":{"id":"ab5*************","inbox_id":"723*************","from":"{feedback@slack.com}","_sdc_deleted_at":null,"_sdc_lsn":15809843519320},"time_extracted":"2024-08-25T20:25:09.736829+00:00"}
The discover for this strean returns the following type for `from`:
Copy code
"from": {
            "items": {
              "type": [
                "string"
              ]
            },
            "type": [
              "array",
              "null"
            ]
          },
The error is:
Copy code
2024-08-25 17:25:09 2024-08-25T20:25:09.749134Z [info     ] 2024-08-25 20:25:09,741 | ERROR    | target-postgres.public-messages | Record validation failed cmd_type=elb consumer=True job_name=dev:tap-postgres-to-target-postgres name=target-postgres producer=False run_id=6c92cd7d-c5c3-49e0-bd8d-c5ed3fae827e stdio=stderr string_id=target-postgres
2024-08-25 17:25:09 2024-08-25T20:25:09.750096Z [info     ] Traceback (most recent call last): cmd_type=elb consumer=True job_name=dev:tap-postgres-to-target-postgres name=target-postgres producer=False run_id=6c92cd7d-c5c3-49e0-bd8d-c5ed3fae827e stdio=stderr string_id=target-postgres
2024-08-25 17:25:09 2024-08-25T20:25:09.750595Z [info     ]   File "/project/.meltano/loaders/target-postgres/venv/lib/python3.9/site-packages/singer_sdk/sinks/core.py", line 121, in validate cmd_type=elb consumer=True job_name=dev:tap-postgres-to-target-postgres name=target-postgres producer=False run_id=6c92cd7d-c5c3-49e0-bd8d-c5ed3fae827e stdio=stderr string_id=target-postgres
2024-08-25 17:25:09 2024-08-25T20:25:09.750913Z [info     ]     self.validator.validate(record) cmd_type=elb consumer=True job_name=dev:tap-postgres-to-target-postgres name=target-postgres producer=False run_id=6c92cd7d-c5c3-49e0-bd8d-c5ed3fae827e stdio=stderr string_id=target-postgres
2024-08-25 17:25:09 2024-08-25T20:25:09.751121Z [info     ]   File "/project/.meltano/loaders/target-postgres/venv/lib/python3.9/site-packages/jsonschema/validators.py", line 451, in validate cmd_type=elb consumer=True job_name=dev:tap-postgres-to-target-postgres name=target-postgres producer=False run_id=6c92cd7d-c5c3-49e0-bd8d-c5ed3fae827e stdio=stderr string_id=target-postgres
2024-08-25 17:25:09 2024-08-25T20:25:09.751311Z [info     ]     raise error                cmd_type=elb consumer=True job_name=dev:tap-postgres-to-target-postgres name=target-postgres producer=False run_id=6c92cd7d-c5c3-49e0-bd8d-c5ed3fae827e stdio=stderr string_id=target-postgres
2024-08-25 17:25:09 2024-08-25T20:25:09.751585Z [info     ] jsonschema.exceptions.ValidationError: '{feedback@slack.com}' is not of type 'array', 'null' cmd_type=elb consumer=True job_name=dev:tap-postgres-to-target-postgres name=target-postgres producer=False run_id=6c92cd7d-c5c3-49e0-bd8d-c5ed3fae827e stdio=stderr string_id=target-postgres
More on đź§µ
âś… 1
The type definition of the
from
field is correct. It is an array of string. The issue seems that the output format from wal2json is encoding the array as
{value}
and the jsonschema validator in singer_sdk is not prepared for that format. As you can see in this test from wal2json repo: https://github.com/eulerto/wal2json/blob/75629c2e1e81a12350cc9d63782fc53252185d8d/expected/include_domain_data_type.out#L91 Array types are encoded as
{value1,value2}
e
This was merged and released 🙂
❤️ 1