chrish
01/28/2023, 2:36 PM\\udc81
surrogates not allowed
I'm probably the only one, but sometimes the data I get is less than perfectly clean. There are some random characters that don't encode cleanly from unicode to utf-8.
I'm using the datamill-co
variant of target-postgres
and it's using psycopg2 to talk to Postgres.
I'd like to tell the loader to just ignore these errors, (perhaps replace it with a specific char I can find later), log the issue, and move on. I've configured the loader with invalid_record_threshold=10
, but that doesn't seem to help.
How do other folks deal with issues like this? Is there a way to configure the loader to ignore? Do you painstakingly pre-clean the data?thomas_briggs
01/28/2023, 7:33 PMchrish
01/29/2023, 1:04 PM