I am trying to finish up some `tap-slack` issues a...
# singer-tap-development
s
I am trying to finish up some
tap-slack
issues and I've got a question: I opted for parent-child relationships between
channels <- messages <- threads
. However, I'd like to override the default state behavior for
threads
that emits a state message for every parent stream. So I thought I would set:
Copy code
@property
    def state_partitioning_keys(self):
        "Remove channel_id and message_ts from state output."
        return []
But, this also appears to remove it from the record output, which is something I do want. Is there a recommended method for removing state while including the partition fields in the output?
I did go ahead and release v1 as well! https://github.com/MeltanoLabs/tap-slack/tree/0.1.1
a
Hi, @stephen_bailey. Because states partition keys are treated as top-priority priorities, they get added to the record automatically. To get them back while not using them as state keys, you should be able to inject them into the record during
post_process()
, getting the values from
context
.
Also, you can simply assign the state key overrides if you prefer the shorter syntax in place of the full property syntax:
Copy code
Class ...
    state_partioning_keys = []
s
nice, i do like that
question: if a tap is doing a full table sync, is there a reason to record the partitioning keys in the state logs?
a
Do you mean is there are reason to partition the stream or is there a reason to store separate states per partition, or do you mean something else?
The partition states are still evidence that the partition completed - or at least that it ran. Even if not helpful for resumability.
s
Gotcha, that is what I was wondering. My impression was that
state
messages were only useful for resumability, but if there are processes that use it for evaluating progress, it makes sense to include it. The pain point here is just that the state messages can get really big with the parent-child relationship functionality.
a
Yes, good callout. In future, we may add cosmetic ("audit" columns, really) like "last_full_sync_on" to the completed message. Those don't change the sync operation but they do give visibility into what ran last time.