ken_payne
06/17/2021, 8:58 PMtap-tableau-server
over to using the excellent hierarchical streams feature and am struggling to understand what is meant by point 4. here regarding state_partitioning_keys
đ¤ I don't want "one bookmark per parent item" but I don't exactly know what keys to use and what behaviour to expect by setting them. Is this equivalent to a replication_key
on a 'normal' Stream? Ideally I wouldn't store any state for my child streams, as it would be meaningless anyway (children cannot change independently of the parent, so only the parents replication_key
matters). On that basis, is state_partitioning_keys = []
sensible?aaronsteers
06/17/2021, 10:11 PMOn that basis, isÂYep, I think youâve nailed it. According to Singer conventions, thereâs no official concept of âstate inheritanceâ between one stream type and another, and so I think at minimum, we would track a single bookmark per stream type. Given that, how does this sound: ⢠The sensible?state_partitioning_keys = []
Workbook
stream type uses the updated-at time for bookmarking, one bookmark total for the stream type.
⢠All others have state_partitioning_keys
= []
as you describe, meaning the âsetâ of keys to be maintained is just one top-level item per stream type (effectively ignoring parent context when writing and managing stream state).aaronsteers
06/17/2021, 10:12 PMaaronsteers
06/17/2021, 10:15 PMken_payne
06/17/2021, 10:24 PMaaronsteers
06/17/2021, 10:37 PMAre you saying it is necessary to set a replication key on children even if I never refer to it?Yes, I think that is correct. I think you want to act as though (as from the naive userâs perspective), that the parentâs replication key actually belongs to the child. Meaning, if I were to inspect the bookmark for the child stream - and if I knew nothing about the parent-child implementation details - I would see a marker that says âthis child stream hasnât been updated since last Tuesdayâ. Then the behind the scenes of exactly where that replication key is sourced from can be ignored when interacting with the tap. In this case, all replication keys come from the ultimate parent stream but the tap user doesnât have to know that. And functionally, if a given workbook hasnât been modified since Tuesday, and the child streamâs replication key says it was updated on Wednesday, we know (at any layer) that the Workbook and child streams do not need to be updated.
aaronsteers
06/17/2021, 10:38 PMniall_woodward
01/09/2022, 11:09 PM