dustin_miller
10/01/2021, 11:13 PMpage
0-based index for requests, and returns a maximum of 100 items.
Unfortunately, if there are less than 100 items, and you pass a page
querystring param that should return 0 rows, it merrily returns the complete set of records.
For example, this request (using ClickUp’s Apiary Mock endpoint, so you can run it as-is) returns a list of valid workspaces for the ClickUp Team with a team_id
of `512`:
curl <https://private-anon-1ab3e1ce0e-clickup20.apiary-mock.com/api/v2/team/512/space>
Note that there are only two workspaces in the response. With a maximum of 100 records per-page, adding a page
querystring param should return zero records. However, If I add a page
querystring param (per the API docs) to return the “next” page of records…
curl <https://private-anon-1ab3e1ce0e-clickup20.apiary-mock.com/api/v2/team/512/space?page=1>
I get back the same two entries.
This is true for live API requests, also.
Is there a recommended approach for dealing with APIs like this? That is, other than submitting a request to ClickUp to fix their API, which I have already done. 😄aaronsteers
10/01/2021, 11:18 PMaaronsteers
10/01/2021, 11:20 PMaaronsteers
10/01/2021, 11:20 PMaaronsteers
10/01/2021, 11:21 PMnext_page_token
to None
any time num_records < records_per_page
- but it doesn't solve for when there are exactly 100 records on the last page - which should be true approximately 1% of the time.aaronsteers
10/01/2021, 11:25 PMnext_page_token
a hash of all the records received per request. Then, checking the last hash against the new hash will tell you the new 100 records are exactly the same as the prior 100, in which case you could then treat the result as 0 records.aaronsteers
10/01/2021, 11:26 PMdustin_miller
10/01/2021, 11:27 PMid
of the first record returned would work, but that depends on the sort order being consistent from one request to another (not all endpoints support a sort_by
param)dustin_miller
10/01/2021, 11:27 PMdustin_miller
10/01/2021, 11:28 PMaaronsteers
10/01/2021, 11:29 PMid
to confirm/question the approach, but yes, you can count the records and stop when n<100 - and also any heuristic to keep from looping when the last page has 100 items.dustin_miller
10/01/2021, 11:30 PMprimary_keys
result is repeated within a given stream? Last one wins?aaronsteers
10/01/2021, 11:30 PMnext_page_token
- and you'll have the prior token when you're evaluating the next one. So, for this use case, you'll probably want to make it a dict so you can keep more detailed info from one page to the next.dustin_miller
10/01/2021, 11:31 PMaaronsteers
10/01/2021, 11:32 PMif next_page_token
- so as long as its not empty or None or similar, anything else will keep the flow going - and then you can smartly compare whatever vars are needed.dustin_miller
10/01/2021, 11:33 PMdustin_miller
10/01/2021, 11:33 PMaaronsteers
10/01/2021, 11:33 PMWhat’s the default behavior if an ostensibly uniqueYes - but I'd be slightly worried about deduping before the merge. On some targets they create a batch and merge upsert and/or insert the result to the target table. Many dedupe also before the merge upsert, but that could vary. ("I'm not sure" is the shorter answer.)result is repeated within a given stream? Last one wins?primary_keys
dustin_miller
10/01/2021, 11:40 PMYes - but I’d be slightly worried about deduping before the merge. [snip]Fair point. If I want this to be usable for any
target-
I’d need to make sure I handle that myself in the tap. Would that best be wrapped into a stream_map
or would it make sense to put some basic “last one wins” logic (or whatever seems correct) into post_process
in client.py
?aaronsteers
10/02/2021, 1:28 AMpost_process()
you can simply return None
to skip a record entirely if it has already been sent.aaronsteers
10/02/2021, 1:28 AMaaronsteers
10/02/2021, 1:31 AMvisch
10/02/2021, 5:13 PMvisch
10/02/2021, 5:17 PMvisch
10/02/2021, 5:50 PMvisch
10/02/2021, 5:51 PMvisch
10/02/2021, 5:52 PMUnfortunately, if there are less than 100 items, and you pass a page querystring param that _*should*_ return 0 rows, it merrily returns the complete set of records.
I don't agree completely. The initial page is page 0.
Page 1 I believe acts appropriately in the case where you have <100 tasks in a folder / folderless portionvisch
10/02/2021, 5:53 PMdustin_miller
10/04/2021, 3:19 PMdustin_miller
10/04/2021, 3:21 PM