Hey there I'm testing meltano and there is a pipel...
# troubleshooting
m
Hey there I'm testing meltano and there is a pipeline from MySQL to big query. There is a table that is 4 million rows and maybe twenty columns and so far it's taking around 3-4 hours for the initial sync. I'm running it locally on a m1 mac with 16gb of memory. Is this expected?
a
HI, @minh. There are several factors that can be at play here. The big ones are: 1. Speed of tap 2. Speed of target 3. Network latency If this is something you'll need to repeat often (for instance), if you need to often repeat FULL_TABLE sync operations, then it's probably worth debugging the process for bottlenecks. If the question is just 'shouldn't it be faster?' probably the answer is 'yes'. There are certain things we can do to help troubleshoot: 1. Run the tap on its own and save output to a file. (While recording timings.) 2. Run the target on its own, sending in the tap's data directly from the generated output file. Those steps will tell you which part of the process is bottlenecking, and would lead to next questions around tuning batch sizes, checking network latency, etc.
Just as an example: If the tap is slow, for instance, and you are running locally, it might be that the connection between your laptop and the MySQL server is slow. In that case, running the same flow from a container in the cloud may significantly improve performance. If the target is slow, it could be that it is flushing cache too often or batch size is too small...
m
Got it thanks! @aaronsteers right now I'm doing an initial sync from meltano and then after that , i'tll be a
LOG_BASED
sync, i'll continue making notes as I try out your suggestion!
a
👍 thankyou