Heya folks, I’ve just set up my first ELT and am h...
# troubleshooting
d
Heya folks, I’ve just set up my first ELT and am hoping someone can help with some things. I’m using the default variants for
tap-postgres
and
target-snowflake
and am having some issues with how certain data types are being handled. My test postgres database has a single table that has columns that include types
text[]
and
json
. This caused my run to fail with:
Copy code
'`schema` is an invalid JSON Schema instance: {"type": "SCHEMA", "stream": "things-employees", "schema": {"type": "object", "properties": {"id": {"type": ["null", "string"]}, "first_name": {"type": ["null", "string"]}, "last_name": {"type": ["null", "string"]}, "salary": {"type": ["null", "number"], "exclusiveMaximum": true, "maximum": 10000000000, "multipleOf": 0.01, "exclusiveMinimum": true, "minimum": -10000000000}, "roles": {"type": ["null", "array"], "items": {"$ref": "#/definitions/sdc_recursive_string_array"}}, "config": {"type": ["null", "object", "array"]}, "updated_at": {"type": ["null", "string"], "format": "date-time"}}, "definitions": {"sdc_recursive_integer_array": {"type": ["null", "integer", "array"], "items": {"$ref": "#/definitions/sdc_recursive_integer_array"}}, "sdc_recursive_number_array": {"type": ["null", "number", "array"], "items": {"$ref": "#/definitions/sdc_recursive_number_array"}}, "sdc_recursive_string_array": {"type": ["null", "string", "array"], "items": {"$ref": "#/definitions/sdc_recursive_string_array"}}, "sdc_recursive_boolean_array": {"type": ["null", "boolean", "array"], "items": {"$ref": "#/definitions/sdc_recursive_boolean_array"}}, "sdc_recursive_timestamp_array": {"type": ["null", "string", "array"], "format": "date-time", "items": {"$ref": "#/definitions/sdc_recursive_timestamp_array"}}, "sdc_recursive_object_array": {"type": ["null", "object", "array"], "items": {"$ref": "#/definitions/sdc_recursive_object_array"}}}}, "key_properties": [], "bookmark_properties": []}\n', '`$ref` path "{\'type\': [\'null\', \'string\', \'array\'], \'items\': {\'$ref\': \'#/definitions/sdc_recursive_string_array\'}}" is recursive'
I dropped these two columns (didn’t test yet to see which was the problem) and then my run completed successfully. However there is a second issue. The
salary
column is of type
float
in snowflake rather than
number(12,2)
which would better match the source type. Is there a way to work with
text[]/json
(whichever is causing the issue), and to specify the mapping of source to destination data type? Or would switching to one of the other tap/target variants help?
a
What you describe sounds similar to issues I’ve run into with tap-mssql here. The first issue sounds like it stems from an unhandled option in the case statement driving conversion from postgres types to json schema types. I haven’t looks at their code, but my guess is you’d likely need to fork the repo and add a case for this/these types, and/or open a ticket or PR on that fork. Or, as you suggest, it’s possible another fork already has this handling. The second issue is similar but more nuanced. If I correctly understand the info here regarding how json schema handles data types, I don’t think there’s a distinction between float and decimal types in json schema and this particular snowflake target treats all non-ints as floats.
It looks like the postgres tap (assuming you are on the pipelinewise fork) does support json and text[] (if I’m reading that right). The issue might entirely be in the snowflake target, and if that’s the case, you might benefit from switching to the pipelinewise-target-snowflake variant.
d
Thanks, I’ll try that out! I’m having trouble figuring out what
file_format
is meant to be in the transferwise
target-snowflake
, but I’m also very new to Snowflake so I’m going to read through the SF docs and see if that clears anything up
a
Circling back from a DM in case anyone else is tracking or curious, the solution here was to create a file format using the prereq instructions here: https://github.com/transferwise/pipelinewise-target-snowflake#pre-requirements And it looks like we were correct about the pipelinewise target supporting those data types. 🙂 🙌