ben_theunissen
07/22/2024, 2:08 PMtap-iceberg
for the last few days and really happy with the current basic integration I've got going and now looking to add better support for incremental replication, and if possible to also support tombstone/deletion rows, and propagating those events to downstream targets.
I've got a few quick questions:
• For tap-iceberg
, I ideally only want to support incremental replication on the Iceberg table sorting key if present, as otherwise performance will be pretty terrible. Are there any specific fields I can set in the Stream
object schema to "advertise" in the schema that the given field can be used as a replication key?
• In get_records
what is the most idiomatic way to access the replication key value if provided to filter? I think I'm on the right track with my current change here, but it is not behaving as I expect in my testing. Are there also any pre-built tests to test incremental replication/state that I can import into the tap as is?
• Supporting deletion rows, I've had some issues in the past grokking how this is to be implemented in Taps, Iceberg supports a concept of deletion/tombstone rows natively, at the start of each incremental tap run should I send a batch of messages which correspond to the rows deleted since the prior incremental run? Any good code examples for existing Taps that support this?
Thanks!Edgar Ramírez (Arch.dev)
07/23/2024, 4:10 PMAre there any specific fields I can set in theNot really. The singer spec mentionsobject schema to "advertise" in the schema that the given field can be used as a replication key?Stream
valid-replication-keys
List of the fields that could be used as replication keys.but the SDK does not implement it. I'll create an issue for it later this week, unless you wanna beat me to it 😉
Inwhat is the most idiomatic way to access the replication key value if provided to filter? I think I'm on the right track with my current change hereget_records
get_starting_replication_key_value
is correct.
but it is not behaving as I expect in my testing. Are there also any pre-built tests to test incremental replication/state that I can import into the tap as is?There aren't. I think we should but I haven't considered a design that makes sense for testing incremental extraction that doesn't involve mocking the source, so I'm very open to ideas and suggestions 🙂
Supporting deletion rows, I've had some issues in the past grokking how this is to be implemented in Taps, Iceberg supports a concept of deletion/tombstone rows natively, at the start of each incremental tap run should I send a batch of messages which correspond to the rows deleted since the prior incremental run? Any good code examples for existing Taps that support this?Taps that support this usually do it within the context of
LOG_BASED
replication. Otherwise, FULL_TABLE
replication has the optional version
feature, for which targets can implement deleting rows not included in the new table "version".