laurent
11/09/2022, 4:56 PMtap-github
and state management. Specifically, I think that the tap is never issuing a proper "end" state message. I'm running it manually outside of any other code, just to isolate things a bit. I run it on a single stream (issue_comments
) for a single repo. The last STATE
message I get is
{
"type": "STATE",
"value": {
"bookmarks": {
"tempStream": {},
"repositories": {
"partitions": [
{
"context": {
"org": "nextcloud",
"repo": "server",
"repo_id": 60243197
}
}
]
},
"issue_comments": {
"partitions": [
{
"context": {
"org": "nextcloud",
"repo": "server"
},
"replication_key_signpost": "2022-11-09T16:44:57.220041+00:00",
"starting_replication_value": "2022-11-07",
"progress_markers": {
"Note": "Progress is not resumable if interrupted.",
"replication_key": "updated_at",
"replication_key_value": "2022-11-09T16:41:17Z"
}
}
]
}
}
}
}
What bugs me is that progress_markers
should not be in there, from what I understand. finalize_state_progress_markers
should remove this, and promote replication_key_value
one level up, so that it's used on the next run. But somehow it does not happen and our tap runs the entire stream everytime. This is with sdk 0.13.1 and the latest master
on the tap. However if I remove state_partitioning_keys = ["repo", "org"]
from the stream definition, then things work as I think they should.
Can someone confirm my understanding?
• the final state message should not contain progress_markers
at all
• tap-github
has a buggy stream definition, and I need to remove the state partitioning keys definition to use the default value
• If both statements above are true, I feel that the sdk shouldn't let me write such code, that will generate invalid state. Should there be an extra test for this?laurent
11/09/2022, 5:11 PMstream.__write__state_message()
just after this line https://github.com/meltano/sdk/blob/main/singer_sdk/tap_base.py#L382 to fix the bugaaronsteers
11/09/2022, 6:09 PM• the final state message should not containYes. That's exactly correct. This sounds like a clear bug in the SDK.at allprogress_markers
•The definition seems legal and valid, but your note above suggests perhaps that the bug is perhaps only occurring whenhas a buggy stream definition, and I need to remove the state partitioning keys definition to use the default valuetap-github
state_partitioning_keys
is overridden.laurent
11/09/2022, 6:36 PMStream.sync_records
the call to finalize_state_progress_markers
does not happen if state_partinioning_keys
is overridden, so the progress markers stay in the state past the end of the method. At the very end, in the line I mentioned above, the state is cleaned/finalised, but there is no further state message issued. If that seems correct to you, I'll open a short PR with the change on the sdk.
I'll see if I can come up with a test for this as well.laurent
11/09/2022, 6:59 PM