Hello all we are using meltano to dump data from a big monol Meltano #troubleshooting

Hello all, we are using meltano to dump data from ...

christian

09/28/2021, 9:42 AM

Hello all, we are using meltano to dump data from a big monolith MySQL database to snowflake. Unfortunately, this does not work. We are using binlog replication and tried a lot of things already (S3 stage, different file formats, parallelism), but the performance is very poor. If we take just one single medium sized table (1.5 GB) with ~8 million rows of the database the initial sync takes 40 min, and a daily refresh takes more than one hour (binlog). As our database has 500GB, we currently don’t see a solution. How do you guys load data into Snowflake from MySQL? Is the approach simply not feasible for large monoliths?

visch

09/28/2021, 11:47 AM

I know about https://transferwise.github.io/pipelinewise/user_guide/fastsync.html Meltano has an issue here https://gitlab.com/meltano/sdk/-/issues/9 I also wonder how much performance is left in the singer spec to squeeze more out of (I think there's a ton there like 50-75% but I"m basing this on data from target-mssql that I wrote, you have to do them target by target)

visch

09/28/2021, 11:48 AM

Pipeline wise's may be a good thing to try for DB -> DB

christian

09/28/2021, 1:25 PM

@visch thanks for your answer. We are using https://github.com/transferwise/pipelinewise-tap-mysql in our setup. I was not really aware about the singer spec in detail. Seems like there is no way we can go with that spec for larger databases. Fast_Sync seems the only way here for us, thanks for that link.

josh_lloyd

09/29/2021, 2:10 PM

This is odd to me. I have the exact same scenario (same tap and target) with data just as large, but I only have those super long load times on my personal machine. When I run it in our stage and prod environments in AWS they run relatively quickly

visch

09/29/2021, 2:14 PM

@josh_lloyd if you download from AWS and have to reupload to AWS that would make sense

josh_lloyd

09/29/2021, 2:16 PM

true, that is our situation

christian

09/30/2021, 7:54 AM

@josh_lloyd our meltano is running on prem, so might be a difference here. But may I ask how much data you transfer and how long the initial full load and the incremental sync takes for prod?

Open in Slack

Previous Next