Hello all, we are using meltano to dump data from ...
# troubleshooting
c
Hello all, we are using meltano to dump data from a big monolith MySQL database to snowflake. Unfortunately, this does not work. We are using binlog replication and tried a lot of things already (S3 stage, different file formats, parallelism), but the performance is very poor. If we take just one single medium sized table (1.5 GB) with ~8 million rows of the database the initial sync takes 40 min, and a daily refresh takes more than one hour (binlog). As our database has 500GB, we currently don’t see a solution. How do you guys load data into Snowflake from MySQL? Is the approach simply not feasible for large monoliths?
v
I know about https://transferwise.github.io/pipelinewise/user_guide/fastsync.html Meltano has an issue here https://gitlab.com/meltano/sdk/-/issues/9 I also wonder how much performance is left in the singer spec to squeeze more out of (I think there's a ton there like 50-75% but I"m basing this on data from target-mssql that I wrote, you have to do them target by target)
Pipeline wise's may be a good thing to try for DB -> DB
c
@visch thanks for your answer. We are using https://github.com/transferwise/pipelinewise-tap-mysql in our setup. I was not really aware about the singer spec in detail. Seems like there is no way we can go with that spec for larger databases. Fast_Sync seems the only way here for us, thanks for that link.
j
This is odd to me. I have the exact same scenario (same tap and target) with data just as large, but I only have those super long load times on my personal machine. When I run it in our stage and prod environments in AWS they run relatively quickly
v
@josh_lloyd if you download from AWS and have to reupload to AWS that would make sense
j
true, that is our situation
c
@josh_lloyd our meltano is running on prem, so might be a difference here. But may I ask how much data you transfer and how long the initial full load and the incremental sync takes for prod?