Hi, currently our Meltano production pipeline does...
# best-practices
h
Hi, currently our Meltano production pipeline does not replicate rows reliably, some rows are missing once a while. We are following up the problem in https://meltano.slack.com/archives/C01TCRBBJD7/p1689945178346729 Meanwhile, to recover the production warehouse, is there a way for us to partial resync the missing rows? Something equivalent to PipelineWise's Resync Tables feature: https://transferwise.github.io/pipelinewise/user_guide/resync.html
v
I don't have an answer for you but as a curiosity are you using Log Based replication in tap-postgres? Or just incremental?
h
We use incremental. Do you think incremental is reliable, or we should switch to log-based?
v
Incremental should be just fine, I'd look into why things are "unreliable" I don't miss anything on my side fwiw
h
Derek, do you use both tap-postgres (transferwise variant) and target-redshift on your side? I wonder if the missing rows issue is more likely caused by the Tap or Target
Is there any caveats on the choice of replication key column in key-based incremental replication method? We use updated_at column as the key. It contains UTC timestamp values.
j
@huiming in the run logs, do you see the select query run against source? If you do, is it
>
or
>=
as the predicates op?
h
We lost the Airflow/Meltano logs during the missing rows happened, probably the app container didn't have time to upload the logs into S3 bucket and hang.
I have a question on key-based incremental replication: How does tap/target actually perform the first full table sync, for a new table? Is the first full table sync performed differently from the subsequent incremental sync? And a few more follow-up questions: • What will happen when the first full table sync is interrupted by a force quit? • Can tap/target still resume the full table sync later? • When resuming, how does tap/target know where to continue?
j
How does tap/target actually perform the first full table sync, for a new table?
Entirely depends on tap and target
Is the first full table sync performed differently from the subsequent incremental sync?
Yes. Typically the full table sync tap will run the sourcing with no replication key i.e. "fetch everything". How the target will behave, depends on the tap i.e. if a database, the target may recreate the table or truncate it instead before loading data into it.
Can tap/target still resume the full table sync later?
Full table sync can not be "resumed" unless you are running an incremental run afterwards. Target should emit "state" throughout the loading. If this is emitted and/or the cadence is entirely up to the target.
When resuming, how does tap/target know where to continue?
This is determined by the state. Take a look at https://hub.meltano.com/singer/spec/ End of the day, it is up to taps and targets to tell singer, what to do, how to do it and so on.