I'm running into an error with invalid unicode cha...
# meltano-plugin-development
d
I'm running into an error with invalid unicode characters and I'm wondering where the best place to tackle this would be. My initial thought was to do it as part of my mapper with a custom eval function, but after trying that, I realized that the error occurs before eval even gets called. It seems to happen while the file is being streamed. I have a function that fixes the problem, just not sure where to put it to avoid having unnecessary processing and potentially breaking something else. Perhaps this should happen as part of the tap and process each value before it gets saved? An example of this is ” vs ". The former shows up with some unicode sequence in the file. My latest example is
\xe2\x80\x90
, which should be a
-
, but isn't. In the database, visually, it looks like a hyphen, but when copied out and searched with a regular hyphen, it doesn't match.
Copy code
Exception has occurred: UnicodeDecodeError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
'charmap' codec can't decode byte 0x90 in position 2548: character maps to <undefined>
  File "C:\Users\daniell\.rye\py\cpython@3.12.4\Lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\git\dagster-hybrid\src\elt_projects\meltano\custom-plugins\map\column-mapper\.venv\Lib\site-packages\singer_sdk\_singerlib\encoding\_base.py", line 61, in _process_lines
    for line in file_input:
  File "C:\git\dagster-hybrid\src\elt_projects\meltano\custom-plugins\map\column-mapper\.venv\Lib\site-packages\singer_sdk\_singerlib\encoding\_base.py", line 48, in listen
    self._process_lines(file_input or self.default_input)
  File "C:\git\dagster-hybrid\src\elt_projects\meltano\custom-plugins\map\column-mapper\.venv\Lib\site-packages\singer_sdk\mapper_base.py", line 135, in invoke
    mapper.listen(file_input)
  File "C:\git\dagster-hybrid\src\elt_projects\meltano\custom-plugins\map\column-mapper\.venv\Lib\site-packages\click\core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\git\dagster-hybrid\src\elt_projects\meltano\custom-plugins\map\column-mapper\.venv\Lib\site-packages\click\core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\git\dagster-hybrid\src\elt_projects\meltano\custom-plugins\map\column-mapper\.venv\Lib\site-packages\singer_sdk\plugin_base.py", line 82, in invoke
    return super().invoke(ctx)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\git\dagster-hybrid\src\elt_projects\meltano\custom-plugins\map\column-mapper\.venv\Lib\site-packages\click\core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "C:\git\dagster-hybrid\src\elt_projects\meltano\custom-plugins\map\column-mapper\.venv\Lib\site-packages\click\core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\git\dagster-hybrid\src\elt_projects\meltano\custom-plugins\map\column-mapper\column_mapper\__main__.py", line 7, in <module>
    ColumnMapperMapper.cli()
  File "C:\Users\daniell\.rye\py\cpython@3.12.4\Lib\runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "C:\Users\daniell\.rye\py\cpython@3.12.4\Lib\runpy.py", line 198, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 2548: character maps to <undefined>
I ended up making adjustments in the tap while reading the records from the database. I'm not sure how the encoding gets determined by singer, but either way, the current solution works for me.