Hey :wave: I am working on porting `tap-tableau-se...
# singer-tap-development
k
Hey 👋 I am working on porting
tap-tableau-server
over to using the excellent hierarchical streams feature and am struggling to understand what is meant by point 4. here regarding
state_partitioning_keys
🤔 I don't want "one bookmark per parent item" but I don't exactly know what keys to use and what behaviour to expect by setting them. Is this equivalent to a
replication_key
on a 'normal' Stream? Ideally I wouldn't store any state for my child streams, as it would be meaningless anyway (children cannot change independently of the parent, so only the parents
replication_key
matters). On that basis, is
state_partitioning_keys = []
sensible?
a
On that basis, is 
state_partitioning_keys = []
 sensible?
Yep, I think you’ve nailed it. According to Singer conventions, there’s no official concept of “state inheritance” between one stream type and another, and so I think at minimum, we would track a single bookmark per stream type. Given that, how does this sound: • The
Workbook
stream type uses the updated-at time for bookmarking, one bookmark total for the stream type. • All others have
state_partitioning_keys
=
[]
as you describe, meaning the “set” of keys to be maintained is just one top-level item per stream type (effectively ignoring parent context when writing and managing stream state).
@ken_payne - Does this meet the behavior you are looking for?
Because streams generally want to know their own replication_key and parent/partition keys, you might additionally need to inject those from the parent context into the child records.
k
Thanks AJ, that makes sense. I am passing the parent ids and the top-level replication_key down into the records of all children (workbooks are 5 or 6 layers deep in places 😅) but not setting any replication_key s on children. I have the option to just use the top-level updated_at replication key for each child, but as I will never retrieve this for any children I didn’t see the point 🤔 Are you saying it is necessary to set a replication key on children even if I never refer to it?
a
Are you saying it is necessary to set a replication key on children even if I never refer to it?
Yes, I think that is correct. I think you want to act as though (as from the naive user’s perspective), that the parent’s replication key actually belongs to the child. Meaning, if I were to inspect the bookmark for the child stream - and if I knew nothing about the parent-child implementation details - I would see a marker that says “this child stream hasn’t been updated since last Tuesday”. Then the behind the scenes of exactly where that replication key is sourced from can be ignored when interacting with the tap. In this case, all replication keys come from the ultimate parent stream but the tap user doesn’t have to know that. And functionally, if a given workbook hasn’t been modified since Tuesday, and the child stream’s replication key says it was updated on Wednesday, we know (at any layer) that the Workbook and child streams do not need to be updated.
Hopefully that makes sense. But admittedly, there is probably more to do in terms of documenting use cases like these. I don’t know if we have put anything down in the docs yet on exactly how to configure replication keys of child streams.
n
Sorry to dig up an old thread, want to make sure I understand. For my example I have two streams, tickets and ticket messages. Ticket messages are a child stream of tickets. The ticket stream's replication key is updated_datetime, and that is updated whenever a message is added or updated that relates to the ticket. The behaviour I'd like is that for any tickets with an updated_datetime greater than or equal to the latest seen, the corresponding messages for that ticket are also pulled. I've set the messages stream's state_partitioning_keys to [] - does that sound correct?