Pagination feature feedback
sindresorhus opened this issue ยท 32 comments
Try out the new pagination feature and let us know how it works. We will improve it based on your feedback. The feature lets you more easily handle the case where you need to request multiple pages from an API.
Here are relevant issues:
Just used this with the WordPress Rest API and it just works. I've come back to the documentation to check because it was too easy and I don't know if I believe it! I feel like I've missed something - I didn't even have to specify any of the callbacks.
let posts = await got.paginate(`${process.env.CRAWL_URL}wp-json/wp/v2/posts`);
console.log(posts.length); // outputs 152
for await (const post of posts) {
...
}
I think you confused got.paginate
with got.paginate.all
. The correct example is:
let posts = await got.paginate.all(`${process.env.CRAWL_URL}wp-json/wp/v2/posts`);
console.log(posts.length); // outputs 152
for (const post of posts) {
...
}
Otherwise posts.length
will be undefined as posts
would be an iterator.
Yes sorry you are right, the example I pasted isn't quite what I've ended up with - I trimmed it down for pasting into here and effectively broke it. got.paginate.all
looks good though, I hadn't seen that.
Thanks!
An amazing feature, works great with GitLab API.
However, it doesn't play well with responseType
and resolveBodyOnly
options. If I have got
instance with options { responseType: 'json', resolveBodyOnly: true }
and even provide _pagination.transform
without JSON.parse
, got
crashes on
Line 79 in 74d66b2
I've tried the pagination feature out today on a Google API and it works really nicely. Thank you for implementing this โค๏ธ
I have a couple of bits of feedback:
- It would be nice to somehow type the response body, as the current types are defaulting to
Response<unknown>
, meaning type casting is necessary inside thepaginate
andtransform
functions. - It's not really obvious what the difference is between
pagination
andpagination.all
. After much trial and error I realised that the former is if you want to usefor await
on the returned generator, whereas the latter must beawait
ed directly and returns the regular array of results.
As a sidenote, I would imagine the major use case for the pagination feature is for pagination.all
, i.e. get all of the results from the endpoint? for await
syntax is not as well known, and (unless I'm missing something) will likely only be used in advanced scenarios. Hence, would it make sense for pagination.all
to be the recommended way of paginating? And, if so, should it be used as the normal pagination
function, with the for await
version being renamed to (e.g.) pagination.each
? I came to this expecting pagination
to work just like pagination.all
does today, given that most of the rest of got
has this architecture (call the function, await it, get the result directly).
Edit: after refining the code, I am now using pagination
directly with for await
after all. However my comment above still stands, that I would expect the pagination
API to work like the rest of the library by default, with an opt-in to the for await
version.
For reference here's my code snippet for interfacing with the Google My Business API, using TypeScript.
interface Account {
name: string;
accountName: string;
}
interface AccountsListResponseBody {
accounts: Account[];
nextPageToken?: string;
}
async function fetchLocations(accessToken: string) {
const myBusinessAPIClient = got.extend({
prefixUrl: 'https://mybusiness.googleapis.com/v4',
headers: {
authorization: `Bearer ${accessToken}`,
},
responseType: 'json',
});
const accountsSearchParams = {
pageSize: 20,
};
const accounts = await myBusinessAPIClient.paginate.all<Account>(
'accounts',
{
searchParams: accountsSearchParams,
pagination: {
transform: (response) =>
(response.body as AccountsListResponseBody).accounts,
paginate: (response) => {
const { nextPageToken } = response.body as AccountsListResponseBody;
if (!nextPageToken) {
return false;
}
return {
searchParams: {
...accountsSearchParams,
pageToken: nextPageToken,
},
};
},
},
}
);
console.log(accounts);
}
It would be nice to somehow type the response body, as the current types are defaulting to Response, meaning type casting is necessary inside the paginate and transform functions.
You can (and should) cast it via a template:
Lines 53 to 63 in 3da16e0
It's not really obvious what the difference is between pagination and pagination.all. After much trial and error I realised that the former is if you want to use for await on the returned generator, whereas the latter must be awaited directly and returns the regular array of results.
all
stands for all items
, which obviously is an array.
As a sidenote, I would imagine the major use case for the pagination feature is for pagination.all, i.e. get all of the results from the endpoint? for await syntax is not as well known, and (unless I'm missing something) will likely only be used in advanced scenarios.
I strongly disagree. People may want to use pagination(...)
instead of pagination.all(...)
to transform data on the fly. Storing 10k items in memory is not a good idea.
for await version being renamed to (e.g.) pagination.each?
IMO that's a good idea. +1 on this one. You can open an issue about this so we don't forget.
I came to this expecting
That's what many people do. They expect their code to just work. If they don't use TypeScript, they should always read the docs unless they have memorized the API. Related with #920
I am using paginate on Sugar CRM api with Typescript node applications.
First thing, it's a brilliant feature which would benefit from more documentation, example, etc. (then again it is still maturing)
Using got@10.x it works like a charm (aside some minor typing things maybe and possibly a heap crash when processing a large set of pages, but I have not fully investigate this yet)
Migrating to got@11.x is promising but currently broken for my usage. I opened several issues. Once I can have got@11.x to work, I will likely contribute a got template for Sugar CRM.
Thanks for the many great repos.
@szmarczak thank you for your comments.
You can (and should) cast it via a template
Yes, that is a good feature - however the casting is only for the list of results. In some cases (such as the Google My Business API), the response body is an object with a property containing the list, like so:
{
"accounts": [...],
"nextPageToken": "some_token"
}
I believe the typings currently account for this possible difference, but they use the Response
type without a generic, and so the response body defaults to Response<unknown>
- hence the need for the manual casting inside transform
and paginate
.
got/source/as-promise/types.ts
Lines 73 to 81 in 3da16e0
That's what many people do. They expect their code to just work.
I understand where you are coming from, but I actually did read the documentation on the pagination API several times, and still didn't fully understand until I had stepped through the code to see what was going on. I think this is simply an issue with the documentation - I would be happy to submit a PR to improve this if it would be acceptable. As also discussed above, I think that having pagination.all
and pagination.each
as the two methods would also help to clear up any misunderstandings.
I hope my original comment didn't offend in any way - I think this is a fantastic feature which has saved me a lot of time, and for that I thank all of the contributors. ๐
In some cases (such as the Google My Business API), the response body is an object with a property containing the list, like so:
That's an interesting one. But I think you can just return response.body.accounts
in transform()
and before you do that just response.request.options.context = {token: nextPageToken};
. Then you can access the token in the paginate()
function.
hence the need for the manual casting inside transform and paginate.
Exactly!
I think this is simply an issue with the documentation - I would be happy to submit a PR to improve this if it would be acceptable.
A PR would be lovely, don't hestitate to send one ;)
I hope my original comment didn't offend in any way
No, it did not. You don't have anything to worry about. Actually your points are good!
But I think you can just return response.body.accounts in transform() and before you do that just response.request.options.context = {token: nextPageToken};. Then you can access the token in the paginate() function.
Indeed, this is totally possible, it's just that TypeScript complains as response.body
is currently typed as unknown
in the transform
function, as there is no way to pass through the response body typings.
Here is a screenshot of VSCode complaining:
Casting manually in transform
fixes the complaint:
I wonder if we could pass in a second (optional) generic to paginate
, which could be passed down to the Response
type? This would solve the issue I believe.
Something like:
export type OptionsWithPagination<T = unknown, B = unknown> = Merge<Options, PaginationOptions<T, B>>;
export interface GotPaginate {
<T, B>(url: string | URL, options?: OptionsWithPagination<T, B>): AsyncIterableIterator<T>;
}
export interface PaginationOptions<T, B> {
pagination?: {
transform?: (response: Response<B>) => Promise<T[]> | T[];
paginate?: (response: Response<B>, allItems: T[], currentItems: T[]) => Options | false;
};
}
A PR would be lovely, don't hestitate to send one ;)
I will try to put something together this week. ๐
Casting manually in transform fixes the complaint:
Can you try
transform: (response: Response<AccountsListResponseBody>) => {
I wonder if we could pass in a second (optional) generic to paginate, which could be passed down to the Response type? This would solve the issue I believe.
๐
I'm trying to use this new feature but can't make it work, not sure what I'm doing wrong ๐
Some feedback:
paginate.all
is not mentioned in the docs, and it's not clear thatpaginate
returns an async iterator. Had to go through issues to discover that- The
paginate
example is wrong:
(async () => {
const limit = 10;
const items = got.paginate('https://example.com/items', {
searchParams: {
limit,
offset: 0
},
pagination: {
paginate: (response, allItems, currentItems) => {
const previousSearchParams = response.request.options.searchParams;
const {offset: previousOffset} = previousSearchParams;
if (currentItems.length < limit) {
return false;
}
return {
searchParams: {
...previousSearchParams,
offset: previousOffset + limit,
}
};
}
}
});
})();
2 issues here:
previousSearchParams
is an object that does not work properly with destructuring (I get symbols but not the intended object)previousOffset
is a string and not a number, it needs to be parsed
Even though I fixed those 2 issues, the pagination does not work properly for me, search params keep getting appended each iteration, ie:
https://dev.to/api/articles/me/all?per_page=2&page=1
https://dev.to/api/articles/me/all?per_page=2&page=2&per_page=2&page=1
https://dev.to/api/articles/me/all?per_page=2&page=3&per_page=2&page=2&per_page=2&page=1
https://dev.to/api/articles/me/all?per_page=2&page=4&per_page=2&page=3&per_page=2&page=2&per_page=2&page=1
...
This is my code, any help appreciated here:
const result = await got.paginate.all(`${apiUrl}/articles/me/all`, {
searchParams: { per_page: paginationLimit, page: 1 },
headers: { 'api-key': devtoKey },
responseType: 'json',
pagination: {
paginate: (response, allItems, currentItems) => {
const previousPage = Number(response.request.options.searchParams.get('page'));
if (currentItems.length < paginationLimit) {
return false;
}
return {
searchParams: {
per_page: paginationLimit,
page: previousPage + 1
}
};
}
}
});
paginate.all is not mentioned in the docs
Nice find. Please open an issue about this.
it's not clear that paginate returns an async iterator.
I disagree: https://github.com/sindresorhus/got#gotpaginateurl-options
2 issues here:
Another one ๐ Please make an issue for this too.
This is my code, any help appreciated here:
ATM I'm fixing bugs, will take a look later.
I disagree: https://github.com/sindresorhus/got#gotpaginateurl-options
This is where I landed from a google search, and the first occurence of it in the docs: https://github.com/sindresorhus/got#pagination
Maybe adding a reference to the full function below would help.
Can you try
transform: (response: Response) => {
This isn't the nicest, as the Response
type is not exported from the default got
export. Instead, you'd need to do
import { Response } from 'got/dist/source/core';
Totally do-able, just doesn't look very nice ๐
got/source/as-promise/types.ts
Line 43 in 46cd61b
got/source/as-promise/index.ts
Line 264 in 46cd61b
Line 128 in 46cd61b
You sure?
I've hit another interesting use case today when working with the Facebook Graph API.
As per the Facebook docs, the API returns the entire 'next' page URL in the response body. The idea is that you only need this URL to fetch the next page of results, saving the hassle of constructing it yourself.
Taking the endpoint me/accounts
as an example. A request can be made as follows:
const { body } = await got.get('https://graph.facebook.com/v6.0/me/accounts', {
searchParams: {
access_token: '<access_token>',
fields: 'id,name',
limit: 1
},
responseType: 'json'
});
Here is the response body for this request:
{
"data": [
{
"id": "<id>",
"name": "<name>"
}
],
"paging": {
"cursors": {
"before": "<cursor_before>",
"after": "<cursor_after>"
},
"next": "https://graph.facebook.com/v6.0/<id>/accounts?access_token=<redacted>&fields=id%2Cname&limit=1&after=<cursor_after>"
}
}
Notice that paging.next
returns the entire URL that should be used to fetch the next set of data. This URL contains an after
search param cursor which is used to paginate.
So now let's hook up the pagination API:
const accounts = await got.paginate.all('https://graph.facebook.com/v6.0/me/accounts', {
searchParams: {
access_token: '<access_token>',
fields: 'id,name',
limit: 1
},
responseType: 'json',
pagination: {
transform: (response) => response.body.data,
paginate: (response) => {
const { paging } = response.body;
if(!paging.next) {
// As per the Facebook docs, if there is no `next`, there is no more data.
return false;
}
// Otherwise, we should return the whole URL.
return {
url: paging.next
};
}
}
});
This almost works, but the issue is that the specified searchParams
in the options takes precedence over the query string parameters in the paging.next
url that is set in the returned url
property. As a result, the after
search param which is present in paging.next
is ignored.
The workaround is to return both a url and a blank string as the searchParams from the paginate
function:
return {
url: paging.next,
searchParams: '' // undefined / null / {} does not work here, it has to be a blank string
};
This is also an issue for other options such as prefixUrl
, which need to be set to the empty string to ensure that got does not include them in subsequent paged requests.
I wonder if there needs to be a way for got to accept a full URL as a return from the paginate
function, and to use that URL as-is (with no merging of previous options). I imagine the system that Facebook has in place is not atypical.
You sure?
Sorry you are right. VSCode didn't seem to pick this up via auto importing. Thanks for clarifying ๐
VSCode didn't seem to pick this up via auto importing
Sometimes I experience this too with other TS projects.
This almost works, but the issue is that the specified searchParams in the options takes precedence over the query string parameters in the paging.next url that is set in the returned url property.
That works as intended. It's in the docs. searchParams
override the ones in the URL. But an empty string should still replace the query string in the URL. Setting that to undefined
should fix this. If it doesn't, please make an issue about this.
This is also an issue for other options such as prefixUrl
That works as intended too. You can avoid this behavior by passing url: new URL(next)
.
Setting that to undefined should fix this. If it doesn't, please make an issue about this.
Strangely, setting to undefined
causes the original searchParams
to be added to the URL again for the second page. I don't have enough data to test against subsequent pages but I suspect the params would continue to be duplicated for each subsequent page. Setting to ''
(empty string) was the only thing I could get to work.
I will submit an issue.
You can avoid this behavior by passing url: new URL(next)
Thanks for the tip! That works nicely.
There are two more relevant issues:
Any feedback is greatly appreciated โค๏ธ
For now I simply do the following until I wrap my head around pagination
const clusters = await got.get(url, options);
response.send(JSON.parse(clusters.body).value);
great module! thanks!
Would it make sense for got.paginate
to act as an event emitter or a readable stream?
We use pagination to query various rest API which will typically return thousands of records, and most of the time the goal is "per record" processing/transformation. In such use cases, we would rather have got.paginate
not to produce a final result but to consume records (or chunks of records) as they arrive.
One possible approach (and perhaps recommended) could be to use pagination.filter
:
- to emit an event or push into a readable stream
- filter out the record (
got.paginate
needs not waste resources in stacking "processed" records)
Working at the pagination.transform
level would allow emitting "chunks" of records (at this point I am unsure if working with chunks or single records would be more resource efficient)... but it feel messy/kludgy.
Is this use case meaningful to others? Should it be covered by the documentation with some sample code? Would extending the pagination API to propose event emitter or readable stream interface be a valuable feature request?
Edit
There is of cause the issue of closing the stream / emitting the done
ending event which I have not yet fullythought over...
to act as an event emitter
You can use the functions in options.pagination
. Currently I don't see any other usage, but if you see one, please share :)
or a readable stream?
Well, in that case you can just make your own loop. got.paginate
is just a convenient wrapper around the Promise API.
You can use the functions in
options.pagination
. Currently I don't see any other usage, but if you see one, please share :)
I will experiment on a large set of data this week, with custom pagination.filter
to emit data to process and pagination.paginate
to emit a done
closing event
Well, in that case you can just make your own loop.
got.paginate
is just a convenient wrapper around the Promise API.
Would a convenient wrapper around the stream API make sense for got
and be worth a PR? The idea came to me with how node-mssql
allows the result of queries to be processed as streams. But maybe not everyone is interested ;)
I just noticed that shouldContinue
executes only when filter
passes. So something like this does not work:
import got from '../../dist/source/index.js';
import Bourne from '@hapi/bourne';
const max = Date.now() - 1000 * 86400 * 7;
const iterator = got.paginate('https://api.github.com/repos/sindresorhus/got/commits', {
pagination: {
transform: response => Bourne.parse(response.body),
filter: ({item}) => {
const date = new Date(item.commit.committer.date);
return date.getTime() - max >= 0;
},
shouldContinue: ({item}) => {
const date = new Date(item.commit.committer.date);
return date.getTime() - max >= 0;
},
countLimit: 50,
backoff: 100,
requestLimit: 1_000,
stackAllItems: false
}
});
console.log('Last 50 commits from now to week ago:');
for await (const item of iterator) {
console.log(item.commit.message.split('\n')[0]);
}
I mean it works but it makes 40 requests instead of 2. The workaround here is:
paginate: data => {
const found = data.currentItems.find(item => {
const date = new Date(item.commit.committer.date);
return date.getTime() - max >= 0;
});
if (found) {
return false;
}
return got.defaults.options.pagination.paginate(data);
}
We could make the first approach do only 2 requests if shouldContinue
was executed on all items.
@sindresorhus What do you think?
This is also an issue for other options such as prefixUrl, which need to be set to the empty string to ensure that got does not include them in subsequent paged requests.
@fiznool Indeed it makes sense to disable prefixUrl
here. Will fix this now. Actually it has been fixed in the main
branch (upcoming Got 12).
We could make the first approach do only 2 requests if shouldContinue was executed on all items.
In what way would we do that? Would it be a breaking change? If so how?
Yes, it would be a breaking change. There is a scenario where filter
may be used to validate responses and shouldContinue
may be used on validated repsonses only. Let's leave it as it is for now.
If anybody is concerned about this issue, please open a new one. I'm closing this one as pagination got stable and v12 is being released very soon.