I am using the default tap-postgres and it does no...
# getting-started
j
I am using the default tap-postgres and it does not seem to have the option of a start_date. What is the best way to load large tables? I had considered creating a static backup...loading historical data from that then switching it over to do the last bit of updating live, but dont see a way to control that. Any Advice on large tables....especially ones which are on a hot standby so updated frequently (operation receives data pretty much 247)
c
I can think of two general pathways: 1. Switch to pure Meltano SDK based tap + target (and potentially look at improving initial load duration via the 'BATCH' support in the Meltano SDK based taps and targets) 2. Run the default pipelinewise tap-postgres in 'INCREMENTAL' mode (since 'LOG_BASED' won't be available in your case with a standby node) and manually try and edit the tap-postgres' Singer bookmarks in the Meltano "state" storage ....
j
Related....in the Postgres incremental key-based sync, you set the replication key. how is that stored and can you alter it for a run?
c
(Slightly related but outdated thread on a similar topic. From before tap-postgres Meltano variant existed and before Meltano SDK hat 'BATCH' message support. https://meltano.slack.com/archives/CMN8HELB0/p1634133762289700)
Related....in the Postgres incremental key-based sync, you set the replication key. how is that stored and can you alter it for a run?
Ha. Looks like you're already going down one of the routes ... 😉 ...
meltano state list
and
meltano state get
are you friends for inspecting those 'bookmarks'
t
We initialize large tables by dumping the data to file using the database's native tools (MySQL, in our case), loading that data into the target using the target's native tools, then setting meltano's state data such that it resumes from the "end" of the manually loaded data. We used to do that by tweaking the data in the internal meltano database (.meltano/meltano.db, which is a SQLite DB) but the
meltano state
commands are now a better option.
j
That is the approach we are taking as well. just didnt know about the state commands but found that late last night. so now testing a small load to get the process correct