A couple Qs about replication trying to understand...
# troubleshooting
a
A couple Qs about replication trying to understand the behaviour I'm seeing: • If
replication_method
is not specified for a stream, is
INCREMENTAL
the default? • When setting
FULL_TABLE
, it looks like singer/meltano can actually figure out what the
primary_keys
should be? Is that right? I'm not setting a PK in python code or meltano.yaml for this stream, but the DDL shows the sql PK as ('context', 'id') which is correct for my table.
p
@Andy Carter are you building your own tap or is this an existing one, if so which one?
a
It's the
tap-instagram
loader
p
• If
replication_method
is not specified for a stream, is
INCREMENTAL
the default?
It depends on the tap implementation. The tap developer gets to define the default behavior. This one is an SDK based tap so these docs are helpful https://sdk.meltano.com/en/latest/incremental_replication.html#example-code-timestamp-based-incremental-replication. Basically if the tap defines a replication key (e.g. the media stream) then it will try to run in incremental mode, and if its run by meltano then it will have state saved between syncs. Optionally this can be overidden a few different ways but using the
--full-refresh
flag for
run
would ignore state.
• When setting
FULL_TABLE
, it looks like singer/meltano can actually figure out what the
primary_keys
should be? Is that right? I'm not setting a PK in python code or meltano.yaml for this stream, but the DDL shows the sql PK as ('context', 'id') which is correct for my table.
Yep similarly the tap developer can set the primary keys for a stream e.g. see the media stream again. Some source systems/APIs have static PKs so the tap developer defines them in the tap whereas others like tap-csv the PKs are set as config values since everyone's CSV files are different.
a
Thanks Pat, appreciate the explanations. That makes sense that if
replication_key
is set then the bevahiour is
INCREMENTAL
even if not explicitly stated. On the second point of primary keys, I cannot see anywhere even in the tap repo where the PK for the stream is defined, yet in the resulting sql database the PK results as ('id', 'context') which is appropriate for the stream. I'm just curious as to how that could happen, I'm probably missing something simple though, I'll keep looking this side.