Hello! I have a MSSQL to Snowflake Meltano etl set...
# troubleshooting
a
Hello! I have a MSSQL to Snowflake Meltano etl set up which is running nicely for most tables, however for a few tables I am getting unicode errors. I'm trying to find a way to deal with these, and am hoping there is a straightforward config entry I can use to either ignore them or manage them in some way.
sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 100069 (22P02): 01b9953c-0001-7d99-0002-033a00053a1a: Error parsing JSON: UTF-16 single low surrogate: 0xdc81
The error refers to the following data found in the field (quite happily queryable from the MSSQL database and returns an accented capital A and an accented lower case a) is it trying to process:
from \u00c3\udc81lvaro Catal\u00c3\u00a1n Merino
Any hints would be appreciated, thanks.
e
If we find a way to make snowflake-sqlalchemy/snowflake-connector-python accept a different encoding or handle them by replacing the offending character we can make that a setting.
a
Thanks for coming back to me @Edgar Ramírez (Arch.dev) Your comment makes complete sense, and for now, the cases seem to be few enough that I can manually alter them in the source data as a work around. Out of interest, would a transform step be able to pick this sort of thing up (i.e. do they work on the JSON so that "errors" can be discovered and logged before then choosing whether to push it through to a target)? (I've not looked at transforms yet, so don't know much about them)
For completeness in case anyone finds this looking at similar issues: Using a mapping helped me solve this. I added a meltano-map-transformer mapping and pre-processed the rows to remove dodgy bytes so that the sqlalchemy package could deal with the data and pass it to Snowflake (example is from another problematic table): mappings: - name: fix_bad_bytes config: stream_maps: customers: FIRSTNAME: FIRSTNAME.encode('utf-8', 'replace').decode('utf-8')