ipfs/in-web-browsers

Tackle identifying origins with (or without?) dweb: paths

flyingzumwalt opened this issue · 16 comments

This is part of the address scheme work described in #3

The underlying requirement:

Firefox, for example, implements https://url.spec.whatwg.org/ not the URL RFC. That's what we need to use if we want our urls to be Web-browser compatible

Background

From @Gozala:

At least in Electron there is no way to make origin be anything other than a hostname, which means all the IPFS content will have either “fs://ipfs” or “fs://ipns” origin. I have starting messing with Gecko and I think it maybe possible to make origin different from hostname but even that I won’t be surprised if things won’t quite work out as expected due to implicit assumptions that hostname is an origin. Either way I’d encourage rethinking addressing as I suspect you’ll have a pushback from browser vendors, not only due to implementation difficulties but more due to introducing a new model that would work different from the established one.

Option: include CID in the origin domain

@Gozala (2017-02-01)

I have followed a rabbit hole of implementing a protocol handler for firefox that would handle fs://ipfs/${cid}/path/with-in such that origin would be fs://ipfs/${cid} but unfortunately my fear got confirmed and it’s undoable without making fundamental changes to the firefox code base & specifically to the parts that deal with content security policy. That is bad because, I expect to be a very hard sell given the implications it could have on millions of Firefox users.

@jbenet

  • Hopefully it can be done with re-routing the data flow to fit what firefox expects.
  • We can always define schemes like ipfs://${cid}/path/... ipns://${cid}/path/... ipld://${cid}/path/... if that's so much easier, just note it will make other things hard. utlimately it's about tradeoffs and what we can get away with.
  • We've been considering changing fs:// to dweb:// which is clearer and more inclusive of other projects. and avoids repeating "fs" so much. (eg dweb://ipfs/${cid}/path/... ---
  • Ultimately, i'm confident we can find a scheme and setup that works for FF, Chrome, other browsers, IPFS, and everyone. It may require changes on our side, or clever re-routing of info (as you've been exploring).

@Gozala (2017-02-06):

It would be invaluable to have those things listed somewhere.

@Gozala (2017-02-01)

Along the way I got some feedback from the people intimately familiar with the relevant code paths in firefox:

Only visible way to implement something like that would be to roll out new C++ implemented component along the lines of nsIStandardURL and patch nsScriptSecurityManager component so that for that type of URL origin will be computed differently. Then also change nsNetUtil component that is actually responsible for validating if resource is with in the policy (for example you can see that for file: protocol different origin checks are performed.

Option: include public key in the paths

@Gozala (2-17-02-06)

Dat for instance is free of this problem as they just use dat://{public_key}/path/with/in so origin is what they want it to be.

origin domains must be case-insensitive

@Gozala:

I attempted to try ipfs://${hash} and ipns://${id} as an alternative solution to make things work in Electron. Issue there is that hostnames are case insensitive & default hashes used by ipfs are by default case sensitive (base58 encoded). Presumably non all lower case addresses could be transcoded to use base16 encoding to avoid this issue, but even than it is not going to be ideal as user maybe be given an address encoded with base58 and say posting it as a link won’t work as expected. Not sure what is the best solution here but ideally all content addresses will be valid.

@jbenet:

What if we resolve through to a CIDv1 encoded in the right base (16 or 32) non-transparently? meaning that we actually resolve through from

fs://ipfs/${CIDv0 or CIDv1 in any base}/path -> ipfs://${CIDv1 in base16 or base32}/path
fs://ipns/${CIDv0 or CIDv1 in any base}/path -> ipns://${CIDv1 in base16 or base32}/path
fs://ipld/${CIDv0 or CIDv1 in any base}/path -> ipld://${CIDv1 in base16 or base32}/path
ipfs://${CIDv0 or CIDv1 in any base}/path -> ipfs://${CIDv1 in base16 or base32}/path
ipns://${CIDv0 or CIDv1 in any base}/path -> ipns://${CIDv1 in base16 or base32}/path
ipld://${CIDv0 or CIDv1 in any base}/path -> ipld://${CIDv1 in base16 or base32}/path

so that the browser can treat ${CIDv1 in base16 or base32} as the origin hostname?

a working solution

@Gozala (2017-02-01)

Now good news is with David Diaz’s help and necessary fixes I was able to work out a solution which works as follows:

fs, ipfs and ipns protocol handlers are added added to firefox.
fs protocol handler essentially just redirects to either ipfs or ipns as follows

fs://ipfs/${cid}/path/with-in/ -> ipfs://${cid_v1_base16}/path/with-in
fs:ipfs/${cid}/path/with-in/ -> ipfs://${cid_v1_base16}/path/with-in
fs:///ipfs/${cid}/path/with-in/ -> ipfs://${cid_v1_base16}/path/with-in
fs:/ipfs/${cid}/path/with-in/ -> ipfs://${cid_v1_base16}/path/with-in
fs://ipns/${cid}/path/with-in/ -> ipns://${cid_v1_base16}/path/with-in
fs:ipns/${cid}/path/with-in/ -> ipns://${cid_v1_base16}/path/with-in
fs:///ipns/${cid}/path/with-in/ -> ipns://${cid_v1_base16}/path/with-in
fs:/ipns/${cid}/path/with-in/ -> ipns://${cid_v1_base16}/path/with-in

both ipfs and ipns protocol handlers redirect to corresponding base16 encoded CID path

ipfs://${cid_v0_base58}/path/with-in -> ipfs://${cid_v1_base16}/path/with-in
ipfs:/${cid_v0_base58}/path/with-in -> ipfs://${cid_v1_base16}/path/with-in
ipfs:///${cid_v0_base58}/path/with-in -> ipfs://${cid_v1_base16}/path/with-in
ipfs:${cid_v0_base58}/path/with-in -> ipfs://${cid_v1_base16}/path/with-in

ipfs://${cid_v1}/path/with-in -> ipfs://${cid_v1_base16}/path/with-in
ipfs:/${cid_v1}/path/with-in -> ipfs://${cid_v1_base16}/path/with-in
ipfs:///${cid_v1}/path/with-in -> ipfs://${cid_v1_base16}/path/with-in
ipfs:${cid_v1}/path/with-in -> ipfs://${cid_v1_base16}/path/with-in

same with ipns

both ipfs and ipns protocol handlers serve content from local node (that is assumed to be running), meaning that firefox will show URLs on the left but will serve content from URLs on the right.

ipfs://${cid_v1_base16}/path/with-in => localhost:8080/ipfs/${cid_v0_base58}/path/with-in
ipns://${cid_v1_base16}/path/with-in => localhost:8080/ipns/${cid_v0_base58}/path/with-in

In a consequence to all the redirects everything works under (what I assume to be) desired origin policy where it’s either ipfs://${cid_v1_base16}/ or ipfs://${cid_v1_base16}/ respectively.

Just to be clear solution described as working only works in firefox, in electron you wound not be able to do following:

ipfs:///${cid_v0_base58}/path/with-in -> ipfs://${cid_v1_base16}/path/with-in

As API exposed for protocol handlers is passed already normalized URL, meaning hostname => cid_v0_base58 is already lower cased so there is no way of knowing what original CID was.

And here is also protocol implementations that I was referring to as a working solution:
https://github.com/Gozala/firefox-ipfs-protocol/blob/master/src/index.js

fs:/ipfs/${cid}/path/with-in/ -> ipfs://${cid_v1_base16}/path/with-in

@Gozala @diasdavid This is already pretty good, but I see two potential issues and a potential fix:

  1. It doesn't retain the original path. We'll run into tricky edge cases reconstructing it, as ipfs-related code will run in all sorts of situations. We should make sure to always have the original canonical path at hand (simplicity).
  2. The way Bitswap works probably requires converting the base16 CID back to the original form. Note that it's not neccessarily a v0 base58 CID (starting in Qm) - it can be anything, even another base16 CID. Another reason to keep the original path at hand.

From what I understand, the mapping to a whatwg-compatible URL with an origin happens purely internally, within the fs: protocol handler? If so, we can retain the whole path, and still construct the host from it.

fs:/ipfs/$cid/path/with-in  =>  fs://$cid16/ipfs/$cid/path/with-in

(fs:/ipfs/hash and fs:/ipns/hash would generate the same origin, but that is okay I think.)

Also note that it removes the need for multiple protocols (ipfs:// ipns:// ipld:// foo://). It basically unifies the fs URI and whatwg URL we were mapping between, into one scheme which can work both in an origin-required, and in an origin-less context.

It simply defines the whatwg-compatible, originful fs URI as: set host to cidv1b16($cidFromPath).

What do you think, does this make sense?

On the encoding note, I think base32 would work too: https://github.com/neocities/hshca

cc @lidel too

To make a long story short, this choice depends on whether the "protocol handler redirect" in the addon is transparent, or propagates into the UX, e.g. whether it behaves as visibly as an HTTP redirect.

If the redirect is purely internal, we should go with fs://$cid16/ipfs/$cid/path/with-in, where $cid16 is only used to have an appropriate origin, but is ignored when it comes to reading the path. It's verbose, but that's okay if it's not visible. It keeps the path consistent which would be a great win.

If the redirect is visible, we should go with ipfs://$cid16/path/with-in (same for ipns://). It's less verbose. ("We can work on unifying the fs-db-web rift later." -- jbenet).

lidel commented

There are various moving pieces and I am afraid we may be doing oversimplification here, but I'd say redirect is internal, meaning there are off-the-channel transformations made by add-on before actual HTTP request to a gateway is made. At least this is what happened in legacy-sdk.

Hard to say how it will work in future.
WebExtensions do not have API for custom protocol handler yet:

Going with fs://$cid16/ipfs/$cid/path/with-in would solve origin issues, but AFAIK there is no API for rewriting address in Location Bar so $cid16 would be always visible.

Let's go with ipfs://$cid32/path/within and ipns://$cid32/path/within -- as discussed in ipfs/specs#152 (comment)

lidel commented

If we go that route, I feel js-ipfs-api should provide CID version detection and conversion as one of its utility functions.

If we go that route, I feel js-ipfs-api should provide CID version detection and conversion as one of its utility functions.

Agreed -- copy-pasting from ipfs/specs#152:

  • e.g. ipfs://$hash
  • Well understood, adhering to WHATWG URL standard
  • Straightforward to implement
  • $hash is host/origin
  • Issues:
    • Hash needs to be base32 encoded, because URL hosts are case-insensitive.
      • Solution 1/2: Needs a redirect to the base32 hash when pasting any non-base32 hashes.
      • Solution 2/2: Probably needs a special base32 CID that retains information about the original non-base32 CID, so that we can avoid confusing UX around changing hashes.
    • Doesn't retain the path, but instead needs conversion step to/from URI.
      • Solution: The URI path can be derived from URL scheme and host and path.

Hi, I just wanted to jump in to say that @heavenlyhash was just visiting me and we discussed this issue, and what we realised, is that it is not possible to have the single slash variants like ipfs:/hash because that's actually a valid POSIX path and NOT a valid URL. It would break web browsers to implement that, so it's just out. Try

$ mkdir -p ipfs://foo
$ realpath ipfs:/foo/
/home/timothy/Pictures2/ipfs:/foo

Note how the second slash just dissapears? Since it is not possible to have a directory with a zero length name in POSIX, // is impossible within a path.

However,

$ mkdir -p ipfs:/foo
$ realpath ipfs:/foo/
/home/timothy/Pictures2/ipfs:/foo

can exist. So basically, ipfs:/ with a single slash would be invading the POSIX namespace and that would not be accepted by browser developers (I hope). Right now it is possible to navigate to say /usr/bin by typing that into the adress bar, and you're not going to break that feature just for the asthetic displeasure of a single extra slash...

@lgierth @lidel is this "tackled"? Should I close the issue, or are there still unresolved details? Have we recorded the conclusions anywhere?

lidel commented

(@lgierth please let me know if I got this right or missed anything here)

Tackled: theory behind Origin's case-insensitivity and CIDs

I think the conclusion was that in an ideal world we would have CID/URL normalization done by browser add-on.

Add-on would detect requests with unsafe CID (case-sensitive, eg. base58) and convert them to safe ones (case-insensitive, eg. base32/16) before sending them to IPFS gateway.

This way Origin based on first segment after :// would "just work".

Missing: Utility method in js-ipfs-api for CID detection/conversion

I did not see it being mentioned anywhere, so I assume it is missing.

Missing: WebExtension API for Programmable Protocol Handler

WebExtensions do not provide means for defining Persistent/Programmable Custom Protocol (that stays in address bar and provides proper Origin support). Firefox 54 will only support simple redirects from web+ipfs://<path> to https://ipfs.io/<path>, which breaks Origin barrier.

There is an open ticket at Bugzilla about need for better API for persistent/programmable protocol handler:

That is all I know, hope it helps.

I'd like to contributes some input on the discussion about address schemes.

NURLs?

As far as I can tell, there is a loose definition for NURLs. I'm assuming a NURL is a "Nestable URL" and would look something like either /ipfs/<cid>/path or /https/<domain>/path. I'd like to see a reference to a more solid specification for NURLs as I haven't been able to find one. However, I take it that a NURL is an alternative to a URL which can be embedded within other syntaxes.

dweb vs fs

It is my opinion that neither dweb or fs is an appropriate URI scheme name. My argument is that both are arbitrary, and in the end it doesn't matter what we name them (either will still "work"). It is my opinion that the URI scheme protocol name should reflect what the address actually is: a URI scheme for NURLs. A nurl:// scheme should be introduced.

NURL URI Scheme

nurl://<origin><nurl>

This is much like the proposed fs:// scheme. However, it can be used with any NURI:

/ipfs/<cid58>/path  =>  nurl://<cid32>/ipfs/<cid58>/path
/http/<domain>/path => nurl://<domain>/http/<domain>/path

The NURL is embedded directly in the URI scheme after a computed origin. The implementation would have full control over the origin, which leads to a decoupling of the NURL address and its origin.

CORs for NURLs

Sense NURLs are directly embedded in the scheme, cross-origin requests can be made possible. Content with an origin of <cid58-A> can make a request for content with an origin of <cid58-B> through this conversion:

/ipfs/<cid58-B>/path  =>  nurl://<cid58-A>/ipfs/<cid58-B/path

Security

It must be noted that for protocols like HTTP that are subject to CSRF attacks, the same-origin policy should be enforced at the NURL URI scheme level. However, the option for an alternative policy is available for protocols like IPFS.

EDIT

See Addendum: #6 (comment)

Base 32 Encodings

Might I suggest the use of Crockford's Encoding for base32 encoded content addresses. It is my opinion that this encoding is the best choice. My reasons:

  1. The encoding includes all numerical digits (0-9); seems reasonable for a number encoding
  2. Begins with numerical digits like hex (0 is a value of zero, 9 is a value of nine, etc).
  3. Prefers numbers over letters (1 over I or L; 0 over O)

EDIT

Added notes regarding Base 32 encoding and Crockford's Encoding particularly here: ipfs/kubo#4143 (comment)

Addendum to the topic of NURL

I should note that a NURL is an incorrect reference to a NURI mentioned at ipfs/kubo#1678 (comment). It is worth mentioning that neither NURL or NURI currently have formal specification as I am aware.

With that said, I see no reason to choose nurl:// over fs:// as a URI scheme. In fact, the idea for a nurl:// scheme in the way that I have proposed exhausts itself in complexity. Making the scheme support existing protocols like HTTP poses challenges further. For example, a web application loaded from nurl://domain.com/http/domain.com/app which includes a relative path to a resources (such as an image) would require the protocol and hostname prefixed in order to behave as expected. If the same web application is loaded from the http:// URI scheme, the prefixes are not wanted. This makes the nurl:// scheme incompatible with web applications built using existing URI schemes.

So as @lgierth mentioned:

If the redirect is purely internal, we should go with fs://$cid16/ipfs/$cid/path/with-in, where $cid16 is only used to have an appropriate origin, but is ignored when it comes to reading the path. It's verbose, but that's okay if it's not visible. It keeps the path consistent which would be a great win.

An fs:// scheme that unifies IPFS and IPNS protocols under one URI scheme would consequentially be subject to similar issues. Here's what I got so far:

  1. All applications loaded using fs:// would require a protocol (IPFS or IPNS) and a CID in the resource paths (e.g. /path/to/my-image.png would need to be /ipfs/<CID>/path/to/my-image.png).
  2. We would lose the concept of a "hostname relative path" for web applications built to be distributed via IPFS.
  3. The fs:// scheme could make it easy for an attacker to exploit data stored under a specific origin (cid32-sensitive) by loading content from a malicious content address (cid58-malicious): fs://<cid32-sensitive>/ipfs/<cid58-malicious>/grab-local-storage-data.html.
lidel commented

Continued in: