Hi all. I think I need some help with a bug I foun...
# singer-tap-development
l
Hi all. I think I need some help with a bug I found in
tap-github
and state management. Specifically, I think that the tap is never issuing a proper "end" state message. I'm running it manually outside of any other code, just to isolate things a bit. I run it on a single stream (
issue_comments
) for a single repo. The last
STATE
message I get is
Copy code
{
  "type": "STATE",
  "value": {
    "bookmarks": {
      "tempStream": {},
      "repositories": {
        "partitions": [
          {
            "context": {
              "org": "nextcloud",
              "repo": "server",
              "repo_id": 60243197
            }
          }
        ]
      },
      "issue_comments": {
        "partitions": [
          {
            "context": {
              "org": "nextcloud",
              "repo": "server"
            },
            "replication_key_signpost": "2022-11-09T16:44:57.220041+00:00",
            "starting_replication_value": "2022-11-07",
            "progress_markers": {
              "Note": "Progress is not resumable if interrupted.",
              "replication_key": "updated_at",
              "replication_key_value": "2022-11-09T16:41:17Z"
            }
          }
        ]
      }
    }
  }
}
What bugs me is that
progress_markers
should not be in there, from what I understand.
finalize_state_progress_markers
should remove this, and promote
replication_key_value
one level up, so that it's used on the next run. But somehow it does not happen and our tap runs the entire stream everytime. This is with sdk 0.13.1 and the latest
master
on the tap. However if I remove
state_partitioning_keys = ["repo", "org"]
from the stream definition, then things work as I think they should. Can someone confirm my understanding? • the final state message should not contain
progress_markers
at all •
tap-github
has a buggy stream definition, and I need to remove the state partitioning keys definition to use the default value • If both statements above are true, I feel that the sdk shouldn't let me write such code, that will generate invalid state. Should there be an extra test for this?
Actually, I think there should be a
stream.__write__state_message()
just after this line https://github.com/meltano/sdk/blob/main/singer_sdk/tap_base.py#L382 to fix the bug
a
Hi, @laurent - yes, this definitely sounds like a bug.
• the final state message should not contain
progress_markers
at all
Yes. That's exactly correct. This sounds like a clear bug in the SDK.
tap-github
has a buggy stream definition, and I need to remove the state partitioning keys definition to use the default value
The definition seems legal and valid, but your note above suggests perhaps that the bug is perhaps only occurring when
state_partitioning_keys
is overridden.
l
@aaronsteers yes, I think what is happening is that in the loop in
Stream.sync_records
the call to
finalize_state_progress_markers
does not happen if
state_partinioning_keys
is overridden, so the progress markers stay in the state past the end of the method. At the very end, in the line I mentioned above, the state is cleaned/finalised, but there is no further state message issued. If that seems correct to you, I'll open a short PR with the change on the sdk. I'll see if I can come up with a test for this as well.