Andy Carter
02/22/2023, 9:37 AMINCREMENTAL
and the tap handles it gracefully, including managing meltano state. The flip side is, if the API does not support a since
timestamp param or similar, you can only get full table updates.
The replication-key
stuff is more for the targets, as in, how to handle upserting new rows into a target database / sink. Again, in general, taps already know the appropriate keys to apply for most streams, but it's up to the target to know what to do when handed the pk. If you're using a target like jsonl
then no upsert method is supported, and new rows from the tap get appended to the existing file. If you were to use a target that supports upsert syntax, then the replication key would be used for the on conflict (pk)
for example if you are on postgres.
Is that mostly right?visch
02/22/2023, 1:32 PMIn general, taps already know how to apply incremental replication if it is available for certain streams, so you only need to setGenerally and ideally yes.and the tap handles it gracefully, including managing meltano state. The flip side is, if the API does not support aINCREMENTAL
timestamp param or similar, you can only get full table updates.since
since
is pretty specific to an api but I think I get your point.
Just know that a lot of this is tap dependent regarding if they do/don't support incremental for a stream. It's not uncommon for a lot of endpoints to be full table even if they can technically support incremental, but that statement really really depends on the tap. Comes down to the tap is the general thing that you'll hit at some point
TheYou're almost there.stuff is more for the targets, as in, how to handle upserting new rows into a target database / sink. Again, in general, taps already know the appropriate keys to apply for most streams, but it's up to the target to know what to do when handed the pk. If you're using a target likereplication-key
then no upsert method is supported, and new rows from the tap get appended to the existing file. If you were to use a target that supports upsert syntax, then the replication key would be used for thejsonl
for example if you are on postgres.on conflict (pk)
replication-key
is actually just for taps, not targets. https://hub.meltano.com/singer/spec#:~:text=the%20replication%20type.-,replication%2Dkey,-All has a bit more info. In your previous paragraph the since
timestamp would be updated via the replication-key
which comes from STATE
, state is passed into taps
when they are called via tap-name --config config.json --state state.json --catalog catalog.json
, these 3 are the 3 to generally understand. state
is the thing that is provided so the tap
can do something unique based on "where" the last sync left off at.
key-properties
fit with your upsert explanation. https://hub.meltano.com/singer/spec#schemas goes over key_properties
a bit more. targets
use key-properties
to know what to upsert based on.
If you were to use a target that supports upsert syntax, then the replication key would be used for theGenerally that's the method that should be use for upsert, but of course it depends on the target's implementationfor example if you are on postgres.on conflict (pk)
Andy Carter
02/22/2023, 9:59 PMvisch
02/22/2023, 10:38 PMAndy Carter
02/23/2023, 8:19 AMreplication_key
is what the tap uses to determine what rows are new/updated, and the primary_key
is used by the target to add or replace the data as appropriate with the upsert.
Sometimes RK and PK are the same if you have a incrementing ID on a sql index and the table row never gets modified, but more often than not, in a standard CRUD app, it will be something like modified_at
for your replication key and then id
for your primary key for handling deduplication.