James Meneghello
11/12/2024, 6:50 AMAndy Carter
11/12/2024, 8:10 AMkey-properties: ['id', 'updated_at']?
James Meneghello
11/12/2024, 1:06 PMhaleemur_ali
11/13/2024, 10:29 PMkey-properties
haleemur_ali
11/13/2024, 10:30 PMAndy Carter
11/14/2024, 9:11 AMAndy Carter
11/14/2024, 9:11 AMhaleemur_ali
11/14/2024, 11:50 AMhaleemur_ali
11/14/2024, 11:54 AMreplication-key
and key-properties
stream metadata fields act independently.
replication-key
is stored by meltano in the state and is used for incremental extracts
key-properties
are typically used in a couple of places: (1) deduplicate records before loading (2) upsert data from staging to target table.
if replication-key
is absent, then meltano resorts to full extracting strategy (vs incremental), if key-properties
is absent, then meltano resorts to append-only
loading strategy (vs upsert)Andy Carter
11/15/2024, 12:05 PMactivate_version
haleemur_ali
11/15/2024, 5:58 PMstg
& final
are referring to the staging and final target tables, and UniquenessViolation
represents a the pseudo code generated from the ON CONFLICT
clause:
append:
for row in stg:
insert(row, final)
upsert/merge:
for row in final:
if UniquenessViolation(row, final):
update(row, final)
else:
insert(row, final)
postgres performs updates by adding a new tuple, and marking the existing tuple as stale, and the stale tuples are cleaned up later in a vacuum process. note that insertion incurs no vaccum cost, but updates, upserts & deletes do, and the cost is related to the actual number of tuples that have to be marked stale. Since this particular use-case is to result in new rows appending to the table, there should be close to 0 vaccuum cost
The additional cost of the executing upsert over the insert statement is the branching.
Modern processors have branch prediction, so it will likely be close to 0, as UniquenessViolation(row, final)
should be mostly false since each check would utilize the timestamp (again, this statement assumes that the machine code that does execute enables branch prediction)