laurent
11/09/2022, 4:56 PMtap-github
and state management. Specifically, I think that the tap is never issuing a proper "end" state message. I'm running it manually outside of any other code, just to isolate things a bit. I run it on a single stream (issue_comments
) for a single repo. The last STATE
message I get is
{
"type": "STATE",
"value": {
"bookmarks": {
"tempStream": {},
"repositories": {
"partitions": [
{
"context": {
"org": "nextcloud",
"repo": "server",
"repo_id": 60243197
}
}
]
},
"issue_comments": {
"partitions": [
{
"context": {
"org": "nextcloud",
"repo": "server"
},
"replication_key_signpost": "2022-11-09T16:44:57.220041+00:00",
"starting_replication_value": "2022-11-07",
"progress_markers": {
"Note": "Progress is not resumable if interrupted.",
"replication_key": "updated_at",
"replication_key_value": "2022-11-09T16:41:17Z"
}
}
]
}
}
}
}
What bugs me is that progress_markers
should not be in there, from what I understand. finalize_state_progress_markers
should remove this, and promote replication_key_value
one level up, so that it's used on the next run. But somehow it does not happen and our tap runs the entire stream everytime. This is with sdk 0.13.1 and the latest master
on the tap. However if I remove state_partitioning_keys = ["repo", "org"]
from the stream definition, then things work as I think they should.
Can someone confirm my understanding?
• the final state message should not contain progress_markers
at all
• tap-github
has a buggy stream definition, and I need to remove the state partitioning keys definition to use the default value
• If both statements above are true, I feel that the sdk shouldn't let me write such code, that will generate invalid state. Should there be an extra test for this?