Get some info about UnixFS objects on public IPFS HTTP API
d70-t opened this issue · 2 comments
Checklist
- My issue is specific & actionable.
- I am not suggesting a protocol enhancement.
- I have searched on the issue tracker for my issue.
Description
I am implementing a backend to access IPFS via the Python library fsspec at ipfsspec. To do so (and to save me from implementing the IPFS protocol in Python), the plan is to access UnixFS files and directories on IPFS via a HTTP gateway. An fsspec
backend needs to implement a function info(path)
which must return
- if the thing behind a path is a directory or a file
- if it is a file, the size of the file
To me, this seems to be a reasonable requirement for other generic filesystem abstractions as well, thus I assume that this feature request could be of broader interest.
While the /v0/files/stat
provides this kind of information, this endpoint is often not reachable on public gateways.
Another option to obtain this information is to perform a HEAD
request towards http://gateway/ipfs/CID
, which in case of a file provides the size in the content-length
header and which (seemingly) lets me discriminate between file and directory using the etag
header. This method works on some public gateways, but scares me as well, as this doesn't seem to be the right use of observable API features.
I see three possible ways to obtain the desired functionality:
- It is already implemented and I didn't find it?
- Move / replicate the
files/stat
API to public gateway port (could this beGET
as well?) - Implement and document HTTP headers which include this information and are to be returned when
/ipfs/CID
is requested
Tagging @whyrusleeping as I've been talking to him about this already on slack.
I understand you want to build something future-proof, and robust.
The long term direction is that we will be removing /api/v0
(subset of go-ipfs' RPC over HTTP, never designed to be exposed on the web) from public gateways and enhancing content paths at /ipfs/{cid}
with necessary APIs.
Detecting a directory today (go-ipfs 0.10)
If you want to implement something against how go-ipfs gateways are today, your best option to detect a directory is sending HTTP HEAD
. IF content-type
is text/html
AND Etag
starts with DirIndex-
then it is a directory listing. While it feels awkward, it is a robust and future-proof check: directory listings will always be returned as HTML by default, and response requires this custom Etag
for cache control to avoid potentially mutable HTML being cached forever like we do with immutable files under /ipfs/
.
Future
In the future, in addition to the Etag way, we most likely will have /ipfs/{cid}?format=dag-json
which will return the dag-pb
root block serialized into a deterministic JSON format that could be cached forever, and/or /ipfs/{cid}?format=unixfs-stats
parameter which will have Type (dir/file).
We are already tracking ?format=
in #8234, but let's keep this one open to ensure it includes the ability to get unixfs directories in more efficient manner.
Feature scope
MVP is to make it possible to send request to /ipfs/{cid}[?format]
where CID is dag-pb (unixfs) and get:
- deterministic dir listing as JSON that can be cached forever (
cache-control: public, max-age=29030400, immutable
) - type (file/directory)
- size (data, data+envelopes)
- links (dir, big file)
Related proposal: add Ipfs-DagSize and Ipfs-DataSize to gateway responses.
If someone needs this, please raise support in the linked issue, or propose IPIP against ipfs/specs repo.