I am using the default tap postgres and it does not seem to Meltano #getting-started

I am using the default tap-postgres and it does no...

jaye_howell

12/15/2022, 11:09 PM

I am using the default tap-postgres and it does not seem to have the option of a start_date. What is the best way to load large tables? I had considered creating a static backup...loading historical data from that then switching it over to do the last bit of updating live, but dont see a way to control that. Any Advice on large tables....especially ones which are on a hot standby so updated frequently (operation receives data pretty much 247)

christoph

12/16/2022, 12:39 AM

I can think of two general pathways: 1. Switch to pure Meltano SDK based tap + target (and potentially look at improving initial load duration via the 'BATCH' support in the Meltano SDK based taps and targets) 2. Run the default pipelinewise tap-postgres in 'INCREMENTAL' mode (since 'LOG_BASED' won't be available in your case with a standby node) and manually try and edit the tap-postgres' Singer bookmarks in the Meltano "state" storage ....

jaye_howell

12/16/2022, 12:40 AM

Related....in the Postgres incremental key-based sync, you set the replication key. how is that stored and can you alter it for a run?

christoph

12/16/2022, 12:40 AM

(Slightly related but outdated thread on a similar topic. From before tap-postgres Meltano variant existed and before Meltano SDK hat 'BATCH' message support. https://meltano.slack.com/archives/CMN8HELB0/p1634133762289700)

christoph

12/16/2022, 12:41 AM

Related....in the Postgres incremental key-based sync, you set the replication key. how is that stored and can you alter it for a run?

Ha. Looks like you're already going down one of the routes ... 😉 ...

meltano state list

and

meltano state get

are you friends for inspecting those 'bookmarks'

thomas_briggs

12/16/2022, 2:03 PM

We initialize large tables by dumping the data to file using the database's native tools (MySQL, in our case), loading that data into the target using the target's native tools, then setting meltano's state data such that it resumes from the "end" of the manually loaded data. We used to do that by tweaking the data in the internal meltano database (.meltano/meltano.db, which is a SQLite DB) but the

meltano state

commands are now a better option.

jaye_howell

12/16/2022, 5:48 PM

That is the approach we are taking as well. just didnt know about the state commands but found that late last night. so now testing a small load to get the process correct

5 Views

Open in Slack

Previous Next