I'm running into an issue with `target-duckdb` - i...
# troubleshooting
s
I'm running into an issue with
target-duckdb
- it fails where
target-jsonl
succeeds while importing from
tap-mysql
, but only on some tables. A typical import error is
duckdb.InvalidInputException: Invalid Input Error: Attempting to execute an unsuccessful or closed pending query result
, which appears to be a catch-all that doesn't really give me much to go on. Any troubleshooting suggestions before I start doing binary search on problematic data in problematic tables?
a
That's rough. If you could get the actual SQL that's failing, or any other logs at all, that would probably point us in a specific direction.
s
Thank you. I'll try to isolate the problematic rows in the problematic tables and post some logs. There's a debug setting to use, right?
Ran the debug-level log and it appears to be an encoding issue:
'utf-8' codec can't decode byte 0xb1 in position 129: invalid start byte
. This tracks, since there are some Czech characters in the text - but I think they're stored as UTF-8 in the MySQL database, and what's more, the point at which the import stops appears to have little to do with the actual content. If I set
batch_size_rows: 1
, the next row in the import typically has no Unicode special characters, let alone this particular one.
It happens even when I exclude columns that could conceivably have special characters and leave only int/bool columns in the import.
Insult to injury, there are plenty of tables/columns where Unicode special characters do appear to be imported just fine - so it's not just the Czech characters being a problem.
Finally, it looks like there's a potential bug in target-duckdb wherein the automatic incremental strategy when there's an auto-incrementing primary key creates an SQL that has a syntax error?
Copy code
2023-03-11T01:06:55.086247Z [info     ] duckdb.ParserException: Parser Error: syntax error at or near "FROM" cmd_type=elb consumer=True name=target-duckdb producer=False stdio=stderr string_id=target-duckdb
2023-03-11T01:06:55.087261Z [info     ] LINE 1: UPDATE debatovanicz."clovek" SET  FROM tmp_763b3f58_4990_4954_85dc_b29298... cmd_type=elb consumer=True name=target-duckdb producer=False stdio=stderr string_id=target-duckdb
2023-03-11T01:06:55.088326Z [info     ]                                           ^ cmd_type=elb consumer=True name=target-duckdb producer=False stdio=stderr string_id=target-duckdb
In case you're curious: it was probably the encoding.