We are scheduling an ELT job for our one of the product table, where data is stored for each order id(Primary key) and data for each order is getting updated with the time. Now we are puzzled over the the replication method that we should adopt for this. We are having an updated_at column for each order id but if we will use INCREMENTAL replication then it is creating multiple rows for each order id for each update. (LOG Based replication is not available with the database we are using) Do we have any way forward for this, so that we can maintain the uniqueness of our records with the updated data?
03/23/2021, 2:30 PM
We would recommend having a deduplication and validation step using dbt. In the ELT paradigm it’s fairly common to have a “base” layer on top of the raw data that dedupes, casts columns, etc.