Good Morning all, I'm new to using Meltano. I've b...
# getting-started
d
Good Morning all, I'm new to using Meltano. I've built out a POC using tap-postgres transferwise variant as my extractor and target-snowflake transferwise variant. I read in the documentation that fastsync is automatically baked in when going from Postgres to Snowflake. However, when running Meltano against larger tables in Snowflake I'm noticing it's taking ~15+ hours to replicate data on the initial run. Is there anything special I have to add to my meltano.yaml to activate fastsync?
t
Fast Sync is a PipelineWise feature. The taps and targets produced by PipelineWise make use of that but not Meltano.
There are a lot of threads here about initial loads being slow. There are some ways to work around the slowness by manually loading the tables and setting state but nothing official. I know there's some work being done by Meltano for "batch mode" or something like that but I don't know where that is in terms of actually being usable.
d
@thomas_briggs thanks for the quick response! When you say manually loading the table, do you mean outside of Meltano? Or this manual workflow can be done with Meltano?
t
Outside of Meltano 😕
Conceptually the steps are: 1. Capture the "binlog position" for the table in question ("binlog position" is a MySQL thing - I'm not sure what the PG equivalent of "binlog position" is, unfortunately, but you can look at the code for the tap to see what it does) 2. Dump the data from the source DB using that DB's tools (psql I think in your case, or maybe pg_dump?) 3. Import the data into the destination DB using that DB's tools 4. Manually set the state of the table in meltano's internal DB a. Create a state.json that specifies the "binlog position" for the table in question (you can run
meltano state get some_id
to see what the state data is supposed to look like b. Run
meltano state set some_id --input-file state.json
to load the state for the table into meltano's internal DB From there, meltano will read changes to the table from the source and push them to the destination as if it had loaded the table itself from the beginning.
Actually between steps 2 and 3 you have to create the table too 😉 To make all this a little smoother you can create a copy of the table in question that contains only one row, let meltano replicate that normally, then a) grab the state data for that as a template for the real table, and b) grab the CREATE TABLE for that to create the real table.
It's an unfortunate problem to have to wrestle with when first learning the tool. 😞 Working through all that will teach you a lot about how meltano works though, on the bright side...
d
Thank you soo much for this!! I'll take this and attempt to bake it into my pipeline!
t
You are welcome. 🙂 Glad I could help!