Hello I have a question about extract load performance I set Meltano #troubleshooting

Hello! I have a question about extract-load perfor...

artem_markin

04/01/2024, 12:22 PM

Hello! I have a question about extract-load performance. I set up tap-oracle and target-mysql, started pipeline and realized that speed is ridiculously slow, about 10k rows in 10 minutes, result table size is 2.3 MB. • Mysql target databse and meltano have the same host, so probably there is not a network speed issue; • I have tried to change target-mysql to target-jsonl, speed performance way better, took about 1 second to load 10k rows on disk Is there any ways to improve performance?

haleemur_ali

04/01/2024, 1:23 PM

that seems pretty bad in terms of performance. i don't think we have enough information to say exactly what may be going wrong with your test, and tbh, i have never had to use target-mysql. would you be able to run the extract-load using log-level debug and share the log output? I'm not sure where the system is bottlenecked. looking at the target-mysql codebase, it seems that data loading is performed here, where the library leverages sqlalchemy's bulk insert operation. The actual query that gets executed is generated by this inherited method

generate_insert_statement

which emits an

insert into ...

statement.

🙏 1

haleemur_ali

04/01/2024, 1:24 PM

I'm not sure if that explains your extremely long load times, but its possible to optimize that by instead loading the data via a load-data command

🙏 1

haleemur_ali

04/01/2024, 1:41 PM

I suspect the above doesn't fully explain the slowness, and there maybe be a bottleneck elsewhere within target-mysql still looking within the bulk_insert_records method, I wonder if the code to conform records for mysql (lines 703 - 714) is taking up a lot of time, because for each

record

, the loop constructs a new

insert_record

, appends it to a list

insert_records

and then uses that to emit the query.

🙏 1

80 Views

Open in Slack

Previous Next