Pawel Plaszczak
02/06/2025, 3:09 PMmeltano run tap-oracle target-oracle
I have tested most basic data types (VARCHAR, DATE, various numeric types), and tested tables small and mid-sized: 30 columns and a few million rows. My test are not complete for all cases: most my tables are small with an average of 1 KB of data per row. I have also successfully tested a table with small CLOBs (CLOB is an UTF8 string that can potentially be very long, it is similar to long VARCHAR).
Observation: the tests are reasonable however in general a bit slow. I have observed: between 200 and 2500 rows per second, and 1 MB/s. That second number is independent of table structure. All my tables are copied at 1 MB/s speed. This seems very slow. During the run, I see: rather low CPU usage, sometimes reaching 100% but usually below 10%, and very low memory usage. I also see that Meltano is pushing data to the target at batches of 10,000 rows. I could not find a way if this or other parameters could be configurable. To compare, our old SAS ETL is twice faster in most cases. What could be the cause for this slow speed, how to debug it and how to possibly change some parameters to experiment?
Then I tested a table with large CLOB column: each record could reach even a few megabyte of data. This was a complete failure:
The entire ETL (meltano run tap-oracle target-oracle ) took 30 minutes to push the first batch of 10,000 rows. [update: I later checked that it was slow just because there was more data, while the average copying speed of 1 MB/s was still maintained. So in fact, copying clobs was not slower than other data types]While this could possibly be reasonable, it would be good to limit the batch size. I noted the count was 0, and then after some 25 mins it jumped to 10,000. I think for large records *it would be reasonable to enforce small batches. How? I*s this parameter down to sqlalchemy, or could I configure it at meltano level?
HintHayden Ness
02/06/2025, 10:56 PMPawel Plaszczak
02/07/2025, 4:37 PM