Good Morning all I m new to using Meltano I ve built out a P Meltano #getting-started

Good Morning all, I'm new to using Meltano. I've b...

dahlia_moncrieffe

02/17/2023, 3:50 PM

Good Morning all, I'm new to using Meltano. I've built out a POC using tap-postgres transferwise variant as my extractor and target-snowflake transferwise variant. I read in the documentation that fastsync is automatically baked in when going from Postgres to Snowflake. However, when running Meltano against larger tables in Snowflake I'm noticing it's taking ~15+ hours to replicate data on the initial run. Is there anything special I have to add to my meltano.yaml to activate fastsync?

thomas_briggs

02/17/2023, 4:01 PM

Fast Sync is a PipelineWise feature. The taps and targets produced by PipelineWise make use of that but not Meltano.

thomas_briggs

02/17/2023, 4:04 PM

There are a lot of threads here about initial loads being slow. There are some ways to work around the slowness by manually loading the tables and setting state but nothing official. I know there's some work being done by Meltano for "batch mode" or something like that but I don't know where that is in terms of actually being usable.

dahlia_moncrieffe

02/17/2023, 4:16 PM

@thomas_briggs thanks for the quick response! When you say manually loading the table, do you mean outside of Meltano? Or this manual workflow can be done with Meltano?

thomas_briggs

02/17/2023, 4:36 PM

Outside of Meltano 😕

thomas_briggs

02/17/2023, 4:43 PM

Conceptually the steps are: 1. Capture the "binlog position" for the table in question ("binlog position" is a MySQL thing - I'm not sure what the PG equivalent of "binlog position" is, unfortunately, but you can look at the code for the tap to see what it does) 2. Dump the data from the source DB using that DB's tools (psql I think in your case, or maybe pg_dump?) 3. Import the data into the destination DB using that DB's tools 4. Manually set the state of the table in meltano's internal DB a. Create a state.json that specifies the "binlog position" for the table in question (you can run

meltano state get some_id

to see what the state data is supposed to look like b. Run

meltano state set some_id --input-file state.json

to load the state for the table into meltano's internal DB From there, meltano will read changes to the table from the source and push them to the destination as if it had loaded the table itself from the beginning.

thomas_briggs

02/17/2023, 4:44 PM

Actually between steps 2 and 3 you have to create the table too 😉 To make all this a little smoother you can create a copy of the table in question that contains only one row, let meltano replicate that normally, then a) grab the state data for that as a template for the real table, and b) grab the CREATE TABLE for that to create the real table.

thomas_briggs

02/17/2023, 4:48 PM

It's an unfortunate problem to have to wrestle with when first learning the tool. 😞 Working through all that will teach you a lot about how meltano works though, on the bright side...

dahlia_moncrieffe

02/17/2023, 4:53 PM

Thank you soo much for this!! I'll take this and attempt to bake it into my pipeline!

thomas_briggs

02/17/2023, 4:55 PM

You are welcome. 🙂 Glad I could help!

2 Views

Open in Slack

Previous Next