ipfs/kubo

Gateway support for /ipfs/{cid}?format=car|raw|...

Closed this issue · 10 comments

lidel commented

@mikeal @olizilla @autonome @warpfork @aschmahmann – this is a quick memo with my initial synthesis of the ?format= idea. Bit thin on details, but want to get early feedback/temperature check before I start elaborating on this in gateway specs in the following weeks.

This is a meta issue for streamlining various feature requests and needs under a single opt-in query parameter that enables gateway users to fetch a specific representation of a specific content path.

Support for each format can be discussed/added via a separate issue/PR – this issue is just for tracking the bigger picture around unified format paramerer.

Note: if you need CARs from an ipfs gateway today, POST to /v0/api/dag/export?arg=<cid>, see: https://docs.ipfs.io/reference/http/api/#api-v0-dag-export

MVP formats

Ability to fetch every CID as full DAG in CAR or a single Block

This is the key feature to enable Verifiable Gateway Responses and "HTTP-based transport for IPFS" (mobile browsers, IoT) without introducing even more dependency on /api/v0, and giving us flexibility for adding new features in the future.

  • ?format=car – implemented in #8758
    • Returns binary stream with CAR archive for entire DAG behind the content path
    • Supersedes /api/v0/dag/export, but with better UX:
      • works on DNSLink websites that do not expose /api/v0
      • content-disposition defaults to {filename|cid}.car
  • ?format=block ?format=raw – implemented in #8758
    • Returns binary array with the root block identified by CID
    • Supersedes /api/v0/block/get, but with better UX:
      • works on DNSLink websites that do not expose /api/v0
      • content-disposition defaults to {cid}.bin

CBOR / JSON

Moved to #8823

Future ideas / lower priorities

  • ?format=tar|zip
  • ?format=share-img
  • ?format=unixfs-stats
    • similar to share-img but in dag-json format, could provide information about Size and Type (dir/file) and be leveraged for efficient pagination of huge directories (#8455, #8528)

Behaviors

  • ?format missing
    • if codec is dag-pb or raw return file/directory
      (current gateway behavior)
    • else (codec without default behavior), return error suggesting passing ?format=car|block|..

I like this querystring approach a lot. You can easily imagine extending this with new parameters for partial dag queries and the like.

One thing I’d like to specify is how an implementation exposes what formats it does and does not support. Then clients can implement fallback logic in order to be more robust and servers aren’t required to implement every feature ever.

It looks promising! Does it matter that it'd be mixing codecs ?format=dag-json and containers ?format=car. How do we deal with verisoning, would CARv2 be ?format=carv2? It's not called out in the issue, but is the plan to also honour an Accept header if provided.

else (codec without default behavior), return error suggesting passing ?format=car|block|..

can a dag-json be returned as json without needing to specify the format? I know this has come up before, but I can't recall why that would be bad.

and can has mime/types ipld/specs#368

but I can't recall why that would be bad.

@olizilla relevant 🧵
#8037 (comment)

lidel commented
  • On default behavior
    • car and block would work for every CID, other things like json or cbor will be available per-codec-basis
    • When requested CID is not available in specified format (or does not have a default one), a human-readable error message suggesting appending explicit ?format=dag-json|dag-dbor|block|car is returned.
    • To facilitate things like automated fallback or format discovery via HEAD request, the list of supported formats could be included as HTTP header such as Link from ipfs/in-web-browsers#179:
      Link: <ipfs://bafy?format=block>; rel=describedby; type="application/octet-stream"
      Link: <ipfs://bafy?format=car>; rel=describedby; type="application/octet-stream"    
      Link: <ipfs://bafy?format=dag-json>; rel=describedby; type="application/json"
      Link: <ipfs://bafy?format=dag-cbor>; rel=describedby; type="application/cbor"
      
    • The concern about default behavior was about binary dag-cbor (thread linked above by @ribasushi).
      • For dag-cbor the default state will be error, as there is no obvious default response, and we may want to render some GUI on gateways in the future.
      • Returning application/json for dag-json by default (without explicit ?format) is ok.
  • On mixing codecs and containers: my reasoning for a single ?format= is that both codecs and containers translate to distinct response formats. HTTP client does not care about IPFS-specific taxonomy, it requests specific thing in specific format (Accept or ?format= ) and gets it.
    • Sidenote: I used explicit dag-cbor and dag-json just to highlight that CIDs will be traversable thanks to IPLD conventions, but we may shorten this to json and cbor to improve UX.
  • On CAR versions: AFAIK we have built-in versioning in CARs: CARv2 will includes version in the header in a backward-compatible way, so CARv1 parser will return "unsupported version" for CARv2. Due to this I see no need for versioning here.

Would it make sense to also look at the Accept header for content types that the application might be expecting? e.g. if Accept contains application/json, return that for the URL. Feels like it'd be closer to what REST APIs already do and might fit well with some tooling.

lidel commented

Food for thought (cc @warpfork): mixing IPLD codec names with formats like car and block could be confusing to users.
Perhaps we should make IPLD codec override (because one is already in the CID) more explicit:

# fetch full DAG or a single block
?format=car 
?format=block

# request response parsed using implicit  IPLD lens (assume multicodec name when format is unknown)
?format=dag-json
?format=dag-cbor
?format=raw # same output as ?format=block (?)

# request response parsed using explicit  IPLD lens
?format=ipld&codec=dag-json
?format=ipld&codec=dag-cbor

# TBD - surface for IPLD selector queries
 ?format=ipld&codec=dag-json&selector={inlined_selector}
 ?format=ipld&codec=dag-cbor&selector={cid_of_a_complex_selector}

This explicit notation provides enough keywords to be self-explaining, and fairly easy to reason about their purpose without forcing users to read the docs.

For daily use, we could add a porcelain in for of a shorter notation where ?format=foo for unsupported foo will evaluate as ?format=ipld&codec=foo

what ever happened to this?

lidel commented

@mikeal prioritization / limited bandwidth within stewards group – fleshing out details is still on the roadmap as part of gateway spec work, which I hope to get back to this quarter.

👉 If someone has bandwidth to make this happen sooner – I am all ears, happy to sync.

lidel commented

Note: car code was added in multiformats/multicodec#258 and the discussion around its meanign and purpose continues in multiformats/multicodec#239 (comment)

My take / question: can we use the codec field of CIDv1 to indicate expected format/transformation when requesting data from Grateways?

[..] convention where raw and car codecs are used on HTTP Gateway as a way of requesting a single Block or a CAR with blocks for a DAG.

  • HTTP GET /ipfs/{cid-with-raw-codec} returning a raw Block
  • HTTP GET /ipfs/{cid-with-car-codec} returning a CAR with the entire DAG behind a CID

In this convention the multihash in a CID represents the root block of a DAG, and if you plan to use car [code] with a multihash that has different meaning, we should agree on that now.

We could play it safe and use ?format=car (or shorter ?as=car) for now,
but things may be more intuitive if ?format=car returns a redirect to CIDv1 with car codec, and that would return a CAR stream.

lidel commented

Block/CAR response types are implemented in #8758 – ready for review, plan is to ship it in go-ipfs 0.13.