Hi all, currently trying to move some data with ta...
# troubleshooting
j
Hi all, currently trying to move some data with tap-mssql buzzcut norman variant and i ran into a weird error.
2024-11-20 02:49:19 +0000 - dagster - INFO - elt_run - 81ea9088-1acd-4042-ae40-7b0c958f9bd9 - elt_run_elOp[licensePlateLookup] - 2024-11-20T02:49:18.950577Z [info ] UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 18-19: unexpected end of data
I assume the issue here is that there's entries in my data table (it is an
NVARCHAR(10)
) that is not playing nice with the
utf-16-le
encoding. I was able to load 90% of the rows in an initial warmup. Looking for thoughts onto what I can do to address this. I am pretty sure that NVARCHAR uses UTF-16-LE though in Microsoft SQL?
👀 1
@BuzzCutNorman if you have any thoughts on this i'd appreciate it!
I realized that since this is a NVARCHAR(10), there's 20 bytes. So it's dying on character 10 (assuming we start at byte 0), so anything that has the full length of 10. I think I found my culprit (data masked because of sensitivity)
b
@joshua_janicas Sorry I didn't see this until this morning. Where you able to finish the load or are you still getting stuck?
j
That's ok, I messaged at midnight my time so I had no expectation anyone would get to this until later 😉
I'm staging some scripts to my devops team to run, and I'll find out in a few hours if it's good
I'll let you know either way how it goes
outside of my side needing to ensure that the data we actually accept into the table is properly sanitized for alphanumerics - i wonder if there's anything else that can be done to mitigate these kinds of situations during the extract step. likely not as it's the actual decoding that's trying to happen...
b
Yes, please let me know what you find. I am like you I can't think of a mitigation I could put in place for this off the top of my head.
j
Yup that was definitely the issue. Resolved the row and we punched through and managed to E+L everytthing
🙌 1
🎉 1