janis_puris
06/27/2023, 1:37 PMtarget-snowflake
process, while tap-oracle
is pretty chill (The EC2 this is running on has 2 CPUs).
What is target-snowflake
doing that makes this throttle so much due to being CPU resource starved.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2867 ubuntu 20 0 2443764 1.8g 52656 R 100.0 46.7 22:30.11 target-snowflak
2855 ubuntu 20 0 283868 116956 16364 S 6.7 2.9 3:06.67 meltano
2865 ubuntu 20 0 62424 42176 18564 R 6.3 1.1 3:41.81 tap-oracle
The massive CPU usage is visible only during the data coming in, when the Oracle queries cursor is "starting up", there is no CPU usage at all on target-snowflake
.
Any ideas what I can do make this speed up? Do I need to throw bigger cores at the EC2 (as it seems single thread bound) š
Logs in thread š§µjanis_puris
06/27/2023, 1:39 PMjanis_puris
06/27/2023, 2:47 PMtap-oracle
is not so easy on the CPU anymore.
This is not documented on the tap facepalm trek
Relevant PR is [AP-953] Add parquet support #149janis_puris
06/27/2023, 2:48 PMmark_johnston
06/28/2023, 9:33 PMtap-oracle
to target-snowflake
so it's interesting to see your results. We also would like to improve performance, but we've managed to get some pretty good results using the pipelinewise variant of `target-snowflake`: https://github.com/transferwise/pipelinewise-target-snowflake and I have a fork of this which adds some functionality and removes things like per-value timestamp adjustment that you shouldn't need with a database source that provides timestamps in the same format each time:
https://github.com/mjsqu/pipelinewise-target-snowflakemark_johnston
06/28/2023, 9:35 PMcProfile
to pick out any unnecessary or repeated function calls