matheus_dantas
07/21/2023, 9:09 AMtap-mysql
. As most of these tables does not have a way to identify the changes, we have decided to use the full_table replication method.
This data is loaded to Redshift using the target-redshift
. The problem is that the behaviour that I was expecting is that as we are using the full table extraction, the target should be able to truncate the table and load everything. But instead it loads the data to a staging table, using the COPY and reading from S3.
After that it compares the new data with the existing table and it runs some UPSERTS (update records that already exists and insert new ones).
This is ok for small table, but for large tables it becomes a problem. So we have cases where the amount of data that was updated or created is small compared to the total number of records, but the process still tries to update everything every day. Updates are expensive operations we should avoid if possible.
Do you guys have a solution for this situation?thomas_briggs
07/21/2023, 1:17 PMthomas_briggs
07/21/2023, 1:18 PMpat_nadolny
07/21/2023, 3:53 PM