Hi everyone I m using the meltano labs variant of tap github Meltano #troubleshooting

Hi everyone, I'm using the meltano labs variant o...

xiaozhou_wang

07/01/2025, 3:51 PM

Hi everyone, I'm using the meltano labs variant of tap-github. These are my configs.

Copy code

- name: tap-github
    variant: meltanolabs
    config:
      flattening_enabled: false
      repositories:
      - XXXXXX
      - XXXXXX
      start_date: '2020-01-01'
    select:
    - commits.*
    - events.*
    - reviews.*
    - issues.*
    - pull_request_commits.*
    - pull_requests.*

It runs fine for about 40 minutes then hits the following error.

Copy code

singer_sdk.exceptions.RetriableAPIError: 403 Client Error: b'{"message":"API rate limit exceeded for user ID 12345. If you reach out to GitHub Support for help, please include the request ID XXXXXXX and timestamp 2025-07-01 15:15:50 UTC.","documentation_url":"<https://docs.github.com/rest/overview/rate-limits-for-the-rest-api>","status":"403"}' (Reason: Forbidden) for path: /repos/XXXXXXX/pulls/12345/commits

I understand that this is due to a github API rate limit being hit. The issue is, is there a way around this while still being able to pull the full history of data? tap-postgres doesn't seem to support an

end_date

parameter and it also doesn't seem to support a

limit

. What I'm hoping for is the task to run successfully and then for the bookmark to increment. That way, even if this requires multiple runs over many hours, it is possible to get through the full history chunk by chunk. Not super clear what

rate_limit_buffer

does.

visch

07/01/2025, 4:39 PM

Have you read through the readme here https://github.com/MeltanoLabs/tap-github ?

xiaozhou_wang

07/01/2025, 5:32 PM

Yes I've read the Readme. It isn't very clear on what the rate limit buffer does. That appears the only mechanism to limit a run. There is no parameter like end_date or max_records. I thought I would ask here before digging into the code and experimenting with that rate limit buffer parameter.

👍 2

visch

07/01/2025, 6:04 PM

Have you tried

additional_auth_tokens

xiaozhou_wang

07/01/2025, 6:09 PM

I have actually used that but the rate limit is hitting at a user level. Additional tokens created by me do not help. So yes I can find some other people to create tokens and give me their tokens or create an org token that has a higher rate limit. That just felt like a very clumsy way around the issue. I was hoping for some mechanism to do the one off backfill in chunks

👍 1

visch

07/01/2025, 6:16 PM

What I'd do is try one of the streams and see if it works, then maybe a github app instead I think the limits are larger

Edgar Ramírez (Arch.dev)

07/01/2025, 6:16 PM

@xiaozhou_wang do you happen to know if the records in any or all of these streams come sorted by the replication key? If that's the case, then we could set

is_sorted = True

in their stream classes to make interruptions safer.

xiaozhou_wang

07/01/2025, 6:23 PM

@visch Thanks I think those are good suggestions. A github app is still not an elegant solution but at least its moving in the direction of being less hacky than multiple personal access tokens. I'll give those a go

xiaozhou_wang

07/01/2025, 6:29 PM

@Edgar Ramírez (Arch.dev) I just had a look at the underlying Github APIs. I think the challenge is there isn't that much consistency across those. So I can see it's a pain from a developer point of view. https://docs.github.com/en/rest?apiVersion=2022-11-28 List Commits uses

since

and

until

which are timestamps to filter. List Pull Requests uses

sort

and page size / page number List Reviews has neither (although typically there will be fewer of these than PRs)

xiaozhou_wang

07/01/2025, 6:35 PM

Likely the smallest lift way to tackle this issue is just to have a parameter that enables backoff. Should come with a warning that it might need to wait an hour at least for the API limit to reset. It will result in a job that's running continuously for hours. However, that's just wasting server time which most times is better than wasting human time

💯 1

2 Views

Open in Slack

Previous Next