Hi team! I’m working on expanding `tap-github` and...
# singer-tap-development
e
Hi team! I’m working on expanding
tap-github
and implemented a
community_profile
tap. However, some repos are returning a 404 for this query and I would like to tell the tap to just move on. I tried to tweak the
post_process
function but without luck. Where would you recommend adding a check for the 404? And maybe returning an empty response instead of an error? Thanks!! If you’re curious, here is my PR - https://github.com/oviohub/tap-github-meltano/pull/3
a
Thanks for being interested in contributing and for sharing your issue. I had to check the reference myself but I think your best bet may be overriding
parse_response()
https://sdk.meltano.com/en/latest/classes/singer_sdk.RESTStream.html#singer_sdk.RESTStream.parse_response
If your failure happens prior to
parse_response()
being called, then an upstream (probably internal/private) method would need to be overloaded. This seems to me very similar to the backoff/retry logic customization question where rather than having a hard failure and retry, you actually want to continue gracefully.
Here's the code snippet that covers this window:
Copy code
prepared_request = self.prepare_request(
                context, next_page_token=next_page_token
            )
            resp = self._request_with_backoff(prepared_request, context)
            for row in self.parse_response(resp):
                yield row
If
_request_with_backoff()
gives a hard stop, you'd need to overload that instead and/or overload the backoff logic. Both of these are possible but not thoroughly explored. Certainly something we'd be open to exploring in an issue and/or a new MR if you end up not being able to implement this by overloading public methods alone.
@eric_boucher does this help?
e
Looking into it now 🙂
Looks like it’s failing before
parse_response
in
_request_with_backoff
a
Okay. Thanks, @eric_boucher for confirming. You can override
_request_with_backoff()
to solve this, but of course we'd like to find a generic+supportable solution for this. Would love it if you felt like opening an issue and/or MR with ideas on how to give devs ability to handle scenarios like this one.
e
For now, do you have a preference between overriding _request_with_backoff and request_records since it is public? It means that we will probably do a few unnecessary requests though since the backoff mechanism would kick in
a
Thanks for asking! My preference would actually be if you could override
_request_with_backoff()
since in theory we could later incorporate just your changes/improvements into an MR that does the same
_request_with_backoff()
could in theory call something like
handle_response()
immediately after the response is received and the http metric is reported. Then custom implementations of
handle_response()
could introduce custom logic, call the
super()
or not, and finally return the response object or throw an error. I think this is where we'd also add something like raising custom rate limit exceptions (retryable or not retryable).
Now we're getting pretty in-the-weeds, would probably be good to capture this in an issue before going too much deeper. Good luck and let use know how it goes! 🙂
e
I updated my PR - https://github.com/oviohub/tap-github-meltano/pull/3/files I can also target it directly to Meltano’s repo if that’s easier I made the changes in the GitHub stream client for now as it seems a bit specific to GitHub API’s weirdness… But we could move it to the singer SDK in the future. • I introduced a new property
tolerated_http_errors
• and created an override for
_request_with_backoff
• finally I improved the GitHubStream
parse_response
a bit but it could probably split into a handle and a parse to make things a bit cleaner if we wanted to
a
Super helpful. Thanks!!
p
I think this is related to this sdk issue also, discussions about more precisely handling errors and dead letter queues for failed https://gitlab.com/meltano/sdk/-/issues/134
e
@aaronsteers I opened a few more PRs with new GitHub streams. Let me know if you have any questions 🙂
a
Thanks, @eric_boucher! FYI @laurent and @ryan_samarakoon: new streams coming soon. Feel free to add comments on any pending MRs. Especially interesting is this question on whether Issues should also include PRs once the PR stream is added.
e
For context - @ryan_samarakoon @laurent and I all work together at Ovio.org 🙂
a
Oh 😄 Right!
e
@aaronsteers do you know if the stream filter is done on the pre or post-processed record? Regarding https://github.com/MeltanoLabs/tap-github/pull/21, I really like your filtering idea, but the github API doesn’t return a “type” by default. I think we should add one by detecting the type with something like:
Copy code
row["type"] = 'pull_request' if row["pull_request"] else 'issue'