Hi currently our Meltano production pipeline does not replic Meltano #best-practices

Hi, currently our Meltano production pipeline does...

huiming

07/25/2023, 9:47 AM

Hi, currently our Meltano production pipeline does not replicate rows reliably, some rows are missing once a while. We are following up the problem in https://meltano.slack.com/archives/C01TCRBBJD7/p1689945178346729 Meanwhile, to recover the production warehouse, is there a way for us to partial resync the missing rows? Something equivalent to PipelineWise's Resync Tables feature: https://transferwise.github.io/pipelinewise/user_guide/resync.html

visch

07/25/2023, 9:04 PM

I don't have an answer for you but as a curiosity are you using Log Based replication in tap-postgres? Or just incremental?

huiming

07/25/2023, 9:41 PM

We use incremental. Do you think incremental is reliable, or we should switch to log-based?

visch

07/25/2023, 11:07 PM

Incremental should be just fine, I'd look into why things are "unreliable" I don't miss anything on my side fwiw

huiming

07/25/2023, 11:34 PM

Derek, do you use both tap-postgres (transferwise variant) and target-redshift on your side? I wonder if the missing rows issue is more likely caused by the Tap or Target

huiming

07/25/2023, 11:46 PM

Is there any caveats on the choice of replication key column in key-based incremental replication method? We use updated_at column as the key. It contains UTC timestamp values.

janis_puris

07/26/2023, 8:12 AM

@huiming in the run logs, do you see the select query run against source? If you do, is it

>=

as the predicates op?

huiming

07/26/2023, 8:50 AM

We lost the Airflow/Meltano logs during the missing rows happened, probably the app container didn't have time to upload the logs into S3 bucket and hang.

huiming

07/26/2023, 8:55 AM

I have a question on key-based incremental replication: How does tap/target actually perform the first full table sync, for a new table? Is the first full table sync performed differently from the subsequent incremental sync? And a few more follow-up questions: • What will happen when the first full table sync is interrupted by a force quit? • Can tap/target still resume the full table sync later? • When resuming, how does tap/target know where to continue?

janis_puris

07/26/2023, 1:20 PM

How does tap/target actually perform the first full table sync, for a new table?

Entirely depends on tap and target

Is the first full table sync performed differently from the subsequent incremental sync?

Yes. Typically the full table sync tap will run the sourcing with no replication key i.e. "fetch everything". How the target will behave, depends on the tap i.e. if a database, the target may recreate the table or truncate it instead before loading data into it.

Can tap/target still resume the full table sync later?

Full table sync can not be "resumed" unless you are running an incremental run afterwards. Target should emit "state" throughout the loading. If this is emitted and/or the cadence is entirely up to the target.

When resuming, how does tap/target know where to continue?

This is determined by the state. Take a look at https://hub.meltano.com/singer/spec/ End of the day, it is up to taps and targets to tell singer, what to do, how to do it and so on.

3 Views

Open in Slack

Previous Next