< edgar ramirez mondragon> Great work on <https github com e Meltano #singer-tap-development

Join Slack

<@U06BX9QQ4A3> Great work on <https://github.com/e...

# singer-tap-development

douwe_maan

03/05/2021, 12:02 AM

@edgar_ramirez_mondragon Great work on https://github.com/edgarrmondragon/tap-confluence!

douwe_maan

03/05/2021, 12:02 AM

Is there a particular reason you're overriding all of

RESTStream.request_records

(code) instead of just

get_next_page_token

(code), which is used by the default

request_records

(code)?

douwe_maan

03/05/2021, 12:03 AM

And instead of overriding

authenticator

(code), I think you could only override

http_headers

(code) and call

super()

to get

User-Agent

for free

douwe_maan

03/05/2021, 12:03 AM

You probably already saw all of that, I just want to make sure we're not missing something that makes you need to override and duplicate more than is ideal 🙂

douwe_maan

03/05/2021, 12:06 AM

It looks like we may also want to provide a native

BasicAuthenticator

to prevent taps from having to go through https://github.com/edgarrmondragon/tap-confluence/blob/master/tap_confluence/streams.py#L28-L30 again and again (cc @aaronsteers)

douwe_maan

03/05/2021, 12:10 AM

And I assume you've added a

resources

setting because catalog-based stream selection hasn't been implemented yet? (Right @aaronsteers? Or is that supported now?)

edgar_ramirez_mondragon

03/05/2021, 12:15 AM

Thanks @douwe_maan! Ah, I missed

http_headers

. Let me try that! For

request_records

, I wasn't able to make the base class'

while

loop work with offset pagination. I was thinking of proposing a pluggable paginator similar to

APIAuthenticator

. Header/token-based is relatively common pagination method, but I'd argue offset is even more so. And yeah, I added

resources

mainly just for my own testing purposes syncing only selected streams and I didn't consider doing it the right way with a catalog, but I definitely will.

douwe_maan

03/05/2021, 12:19 AM

For
request_records
, I wasn't able to make the base class'
while
loop work with offset pagination.

Ah, right, because

get_next_page_token

wouldn't have access to your

start

... Sounds like we should pass

next_page_token

(code) to

get_next_page_token

prev_page_token

. Then you could define it as:

Copy code

def get_next_page_token(self, response: requests.Response, prev_page_token) -> Any:
    data = response.json()
    size, limit = data["size"], data["limit"]
    
    if size < limit:
        return None

    return prev_page_token + limit

And/or we could make it pluggable as you suggest. Sounds like it's time for an issue 🙂

douwe_maan

03/05/2021, 12:27 AM

@aaronsteers is on top of everything: https://gitlab.com/meltano/singer-sdk/-/commit/9c48d71b983f8e8b61ec4cc2d9582f82bf122d33

edgar_ramirez_mondragon

03/05/2021, 12:29 AM

amazing!

aaronsteers

03/05/2021, 12:29 AM

I’m in a working session with @John Timeus (Slalom Consulting) where we are handling an advanced pagination challenge in the Power BI tap. Hopefully this also helps on your side as well. 🙂

aaronsteers

03/05/2021, 12:32 AM

Importantly, the “get_next_page_token” also needed the previous token in order to properly get the loop parameters for a complex set of loops.

aaronsteers

03/05/2021, 12:34 AM

I’ll have to look more into the resources and authentication questions once we wrap up this pair coding session.

aaronsteers

03/05/2021, 1:31 AM

@douwe_maan said:

Ah, right, because
get_next_page_token
wouldn’t have access to your
start
... Sounds like we should pass
next_page_token
(code) to
get_next_page_token
as
prev_page_token
…

Wow - I think it’s crazy that I was basically implementing exactly this change, perhaps even as you were typing it. 🙂 I didn’t see your comment until after, but we ran into the same challenge in the Power BI tap and came upon that as the best solve. Here is the working implementation we came up with for Power BI REST API. For this, we needed to keep track of our own keys along the way so we could continue to loop through them according to custom app logic: https://github.com/dataops-tk/tap-powerbi-metadata/blob/fix/remove-extra-logic/tap_powerbi_metadata/streams.py#L77

aaronsteers

03/05/2021, 1:32 AM

@edgar_ramirez_mondragon - It looks like you might be on your way to the solution you need but do let me know if I’ve missed anything on my end!

douwe_maan

03/05/2021, 1:35 AM

@aaronsteers ha, what a coincidence!

edgar_ramirez_mondragon

03/05/2021, 1:49 AM

@aaronsteers just pulled your feature branch and I could use it with a slight change to @douwe_maan’s suggestion to handle the None token:

Copy code

def get_next_page_token(
    self,
    response: requests.Response,
    previous_token: Optional[int],
) -> Optional[int]:
    previous_token = previous_token or 1

    data = response.json()
    size, limit = data["size"], data["limit"]

    if size < limit:
        return None

    return previous_token + limit

aaronsteers

03/05/2021, 4:54 PM

Fantastic! Thanks for the feedback. If you see opportunities to improve, please feel free to leave them in that MR. Not all changes are getting their own Merge Requests at this point, but for significant spec changes, I think it makes sense to have a dedicated place to collect feedback. Once we go to

0.1.0

, the plan would be have everything in the repo's change log and with associated MRs.

Open in Slack

Previous Next