ipfs/kubo

Gateway directory listings should be paginated

Opened this issue · 5 comments

Checklist

  • My issue is specific & actionable.
  • I am not suggesting a protocol enhancement.
  • I have searched on the issue tracker for my issue.

Description

To render the directory listing page, go-ipfs sequentially fetches blocks for every directory entry. For large directories, this takes a very long time (see e.g. #7588). The gateway should paginate this listing so that there's a reasonable upper bound on the time it takes to return a response, and to allow more even load distribution across gateway fleets. I'd suggest some query args to control page size and offset, with an upper bound of 20 on the page size (this upper bound could be configurable).

lidel commented

This would require a serious refactor of how https://github.com/ipfs/dir-index-html works
(which we already want to do, but is a bigger adventure).

Given that we want to improve IPLD support on gateways (ipfs/in-web-browsers#182),
we should do this type of thing in a generic way that works for all DAG types,
and lazy-load additional Size and Type information when the DAG is unixfs.

I envision replacing unixfs-specific dir-index-html with "IPLD Explorer v2" that shows generic DAG view by default, but has specially-crafted variants for most popular codecs like dag-pb (unixfs) and leverages something like ?format=unixfs-info (#8234) for lazy-loading additional metadata about Size and Type only for items visible on the page.

I agree with the bigger picture, but this also seems relatively low effort and addresses an availability risk. Assuming the "quick fix" is straightforward, I think it makes sense to do both (quick fix now, generic fix later).

We should also add some metrics around this, because from what I can tell, we don't have good visiblity into how much this contributes to aggregate metrics like latency, TTFB, etc.

If we're just trying to make non-sharded directories faster, #8178 is probably a simpler short-term solution.

Eventually, we'll likely need pagination for sharded directories. But we'll need to add the ability to "seek" which will require some design work.

Some stuff discussed with Lidel that might be useful for consideration:

  • Specify a basic pagination spec which uses query strings
  • ?page=0&limit=100
  • If a gateway thinks there's "too many" entries (I think >100) it can send a 302 redirect pointing to the first "page"
  • Applications aware of the pagination API could then request later pages or larger limits
  • The generated HTML directory listing could have buttons for incrementing the page or changing the limit (basic HTML <form>?)

The pagination should try to use regular traversal and account for whatever ADLs exist at that point including HAMTs.

lidel commented

I also wrote some notes about an alternative approach, which essentially removes the need for pagination: #9058 and included both in HTML Gateway specs under best practices section (ipfs/specs@9fc9a9c)