ipfs/kubo

Get some info about UnixFS objects on public IPFS HTTP API

d70-t opened this issue · 2 comments

d70-t commented

Checklist

  • My issue is specific & actionable.
  • I am not suggesting a protocol enhancement.
  • I have searched on the issue tracker for my issue.

Description

I am implementing a backend to access IPFS via the Python library fsspec at ipfsspec. To do so (and to save me from implementing the IPFS protocol in Python), the plan is to access UnixFS files and directories on IPFS via a HTTP gateway. An fsspec backend needs to implement a function info(path) which must return

  • if the thing behind a path is a directory or a file
  • if it is a file, the size of the file

To me, this seems to be a reasonable requirement for other generic filesystem abstractions as well, thus I assume that this feature request could be of broader interest.

While the /v0/files/stat provides this kind of information, this endpoint is often not reachable on public gateways.

Another option to obtain this information is to perform a HEAD request towards http://gateway/ipfs/CID, which in case of a file provides the size in the content-length header and which (seemingly) lets me discriminate between file and directory using the etag header. This method works on some public gateways, but scares me as well, as this doesn't seem to be the right use of observable API features.

I see three possible ways to obtain the desired functionality:

  • It is already implemented and I didn't find it?
  • Move / replicate the files/stat API to public gateway port (could this be GET as well?)
  • Implement and document HTTP headers which include this information and are to be returned when /ipfs/CID is requested

Tagging @whyrusleeping as I've been talking to him about this already on slack.

lidel commented

I understand you want to build something future-proof, and robust.

The long term direction is that we will be removing /api/v0 (subset of go-ipfs' RPC over HTTP, never designed to be exposed on the web) from public gateways and enhancing content paths at /ipfs/{cid} with necessary APIs.

Detecting a directory today (go-ipfs 0.10)

If you want to implement something against how go-ipfs gateways are today, your best option to detect a directory is sending HTTP HEAD. IF content-type is text/html AND Etag starts with DirIndex- then it is a directory listing. While it feels awkward, it is a robust and future-proof check: directory listings will always be returned as HTML by default, and response requires this custom Etag for cache control to avoid potentially mutable HTML being cached forever like we do with immutable files under /ipfs/.

Future

In the future, in addition to the Etag way, we most likely will have /ipfs/{cid}?format=dag-json which will return the dag-pb root block serialized into a deterministic JSON format that could be cached forever, and/or /ipfs/{cid}?format=unixfs-stats parameter which will have Type (dir/file).

We are already tracking ?format= in #8234, but let's keep this one open to ensure it includes the ability to get unixfs directories in more efficient manner.

Feature scope

MVP is to make it possible to send request to /ipfs/{cid}[?format] where CID is dag-pb (unixfs) and get:

  • deterministic dir listing as JSON that can be cached forever (cache-control: public, max-age=29030400, immutable)
  • type (file/directory)
  • size (data, data+envelopes)
  • links (dir, big file)
lidel commented

Related proposal: add Ipfs-DagSize and Ipfs-DataSize to gateway responses.
If someone needs this, please raise support in the linked issue, or propose IPIP against ipfs/specs repo.