Hello! I have a question about extract-load perfor...
# troubleshooting
a
Hello! I have a question about extract-load performance. I set up tap-oracle and target-mysql, started pipeline and realized that speed is ridiculously slow, about 10k rows in 10 minutes, result table size is 2.3 MB. • Mysql target databse and meltano have the same host, so probably there is not a network speed issue; • I have tried to change target-mysql to target-jsonl, speed performance way better, took about 1 second to load 10k rows on disk Is there any ways to improve performance?
h
that seems pretty bad in terms of performance. i don't think we have enough information to say exactly what may be going wrong with your test, and tbh, i have never had to use target-mysql. would you be able to run the extract-load using log-level debug and share the log output? I'm not sure where the system is bottlenecked. looking at the target-mysql codebase, it seems that data loading is performed here, where the library leverages sqlalchemy's bulk insert operation. The actual query that gets executed is generated by this inherited method
generate_insert_statement
which emits an
insert into ...
statement.
šŸ™ 1
I'm not sure if that explains your extremely long load times, but its possible to optimize that by instead loading the data via a load-data command
šŸ™ 1
I suspect the above doesn't fully explain the slowness, and there maybe be a bottleneck elsewhere within target-mysql still looking within the bulk_insert_records method, I wonder if the code to conform records for mysql (lines 703 - 714) is taking up a lot of time, because for each
record
, the loop constructs a new
insert_record
, appends it to a list
insert_records
and then uses that to emit the query.
šŸ™ 1