Hi everyone; question about E+L speeds. I have bee...
# troubleshooting
j
Hi everyone; question about E+L speeds. I have been running
tap-mssql
(Buzzcut Norman variant) and
target-snowflake
(Meltano Variant) over the past month, and I am getting to the point now where I am using one of my smaller clients to load data into snowflake (about a 57 GB database). For some of my "bigger" tables (1.4 million rows or so), it's taken about 18 minutes to load (~ 80,000 rows/min). While for a smaller client this speed is not too bad (although the amount of logs I am seeing is huge!), I have bigger clients whose same table has about 40 million rows. Given the same speed, it'd take me about 9 to 10 hours to load that single table. I know that there's some efforts right now to address the batching side of things to increase from default size of 10K / batch: https://github.com/meltano/sdk/pull/1876 . Are there any recommendations or thoughts as to what I can do on my end in the meantime to increase the speed?
v
https://sdk.meltano.com/en/latest/dev_guide.html#testing-performance is where I'd start if I was worried about performance!
Way I handle this a lot of times is just by running individual tables at a time and then running a bunch in parallel
Not the best as we could do better on a per table level but I haven't needed to squeeze performance
j
By which you mean run known large tables individually, then for everything else that is smaller/more manageable can be run in parallel?
v
That works as well!
Both work just have to pick the strategy that works best for you
s
Have you tried other taps e.g. The pipeline-wise variant https://github.com/wintersrd/pipelinewise-tap-mssql . I pushed a change a few days ago into this tap which allows you to adjust the cursor_array_size which increases the tap performance quite a lot.