Tackle identifying origins with (or without?) dweb: paths
flyingzumwalt opened this issue · 16 comments
This is part of the address scheme work described in #3
The underlying requirement:
Firefox, for example, implements https://url.spec.whatwg.org/ not the URL RFC. That's what we need to use if we want our urls to be Web-browser compatible
Background
From @Gozala:
At least in Electron there is no way to make origin be anything other than a hostname, which means all the IPFS content will have either “fs://ipfs” or “fs://ipns” origin. I have starting messing with Gecko and I think it maybe possible to make origin different from hostname but even that I won’t be surprised if things won’t quite work out as expected due to implicit assumptions that hostname is an origin. Either way I’d encourage rethinking addressing as I suspect you’ll have a pushback from browser vendors, not only due to implementation difficulties but more due to introducing a new model that would work different from the established one.
Option: include CID in the origin domain
@Gozala (2017-02-01)
I have followed a rabbit hole of implementing a protocol handler for firefox that would handle fs://ipfs/${cid}/path/with-in such that origin would be fs://ipfs/${cid} but unfortunately my fear got confirmed and it’s undoable without making fundamental changes to the firefox code base & specifically to the parts that deal with content security policy. That is bad because, I expect to be a very hard sell given the implications it could have on millions of Firefox users.
- Hopefully it can be done with re-routing the data flow to fit what firefox expects.
- We can always define schemes like ipfs://${cid}/path/... ipns://${cid}/path/... ipld://${cid}/path/... if that's so much easier, just note it will make other things hard. utlimately it's about tradeoffs and what we can get away with.
- We've been considering changing fs:// to dweb:// which is clearer and more inclusive of other projects. and avoids repeating "fs" so much. (eg dweb://ipfs/${cid}/path/... ---
- Ultimately, i'm confident we can find a scheme and setup that works for FF, Chrome, other browsers, IPFS, and everyone. It may require changes on our side, or clever re-routing of info (as you've been exploring).
@Gozala (2017-02-06):
It would be invaluable to have those things listed somewhere.
@Gozala (2017-02-01)
Along the way I got some feedback from the people intimately familiar with the relevant code paths in firefox:
Only visible way to implement something like that would be to roll out new C++ implemented component along the lines of nsIStandardURL and patch nsScriptSecurityManager component so that for that type of URL origin will be computed differently. Then also change nsNetUtil component that is actually responsible for validating if resource is with in the policy (for example you can see that for file: protocol different origin checks are performed.
Option: include public key in the paths
@Gozala (2-17-02-06)
Dat for instance is free of this problem as they just use dat://{public_key}/path/with/in so origin is what they want it to be.
origin domains must be case-insensitive
I attempted to try ipfs://${hash} and ipns://${id} as an alternative solution to make things work in Electron. Issue there is that hostnames are case insensitive & default hashes used by ipfs are by default case sensitive (base58 encoded). Presumably non all lower case addresses could be transcoded to use base16 encoding to avoid this issue, but even than it is not going to be ideal as user maybe be given an address encoded with base58 and say posting it as a link won’t work as expected. Not sure what is the best solution here but ideally all content addresses will be valid.
What if we resolve through to a CIDv1 encoded in the right base (16 or 32) non-transparently? meaning that we actually resolve through from
fs://ipfs/${CIDv0 or CIDv1 in any base}/path -> ipfs://${CIDv1 in base16 or base32}/path
fs://ipns/${CIDv0 or CIDv1 in any base}/path -> ipns://${CIDv1 in base16 or base32}/path
fs://ipld/${CIDv0 or CIDv1 in any base}/path -> ipld://${CIDv1 in base16 or base32}/path
ipfs://${CIDv0 or CIDv1 in any base}/path -> ipfs://${CIDv1 in base16 or base32}/path
ipns://${CIDv0 or CIDv1 in any base}/path -> ipns://${CIDv1 in base16 or base32}/path
ipld://${CIDv0 or CIDv1 in any base}/path -> ipld://${CIDv1 in base16 or base32}/path
so that the browser can treat ${CIDv1 in base16 or base32} as the origin hostname?
a working solution
@Gozala (2017-02-01)
Now good news is with David Diaz’s help and necessary fixes I was able to work out a solution which works as follows:
fs, ipfs and ipns protocol handlers are added added to firefox.
fs protocol handler essentially just redirects to either ipfs or ipns as followsfs://ipfs/${cid}/path/with-in/ -> ipfs://${cid_v1_base16}/path/with-in
fs:ipfs/${cid}/path/with-in/ -> ipfs://${cid_v1_base16}/path/with-in
fs:///ipfs/${cid}/path/with-in/ -> ipfs://${cid_v1_base16}/path/with-in
fs:/ipfs/${cid}/path/with-in/ -> ipfs://${cid_v1_base16}/path/with-in
fs://ipns/${cid}/path/with-in/ -> ipns://${cid_v1_base16}/path/with-in
fs:ipns/${cid}/path/with-in/ -> ipns://${cid_v1_base16}/path/with-in
fs:///ipns/${cid}/path/with-in/ -> ipns://${cid_v1_base16}/path/with-in
fs:/ipns/${cid}/path/with-in/ -> ipns://${cid_v1_base16}/path/with-inboth ipfs and ipns protocol handlers redirect to corresponding base16 encoded CID path
ipfs://${cid_v0_base58}/path/with-in -> ipfs://${cid_v1_base16}/path/with-in
ipfs:/${cid_v0_base58}/path/with-in -> ipfs://${cid_v1_base16}/path/with-in
ipfs:///${cid_v0_base58}/path/with-in -> ipfs://${cid_v1_base16}/path/with-in
ipfs:${cid_v0_base58}/path/with-in -> ipfs://${cid_v1_base16}/path/with-inipfs://${cid_v1}/path/with-in -> ipfs://${cid_v1_base16}/path/with-in
ipfs:/${cid_v1}/path/with-in -> ipfs://${cid_v1_base16}/path/with-in
ipfs:///${cid_v1}/path/with-in -> ipfs://${cid_v1_base16}/path/with-in
ipfs:${cid_v1}/path/with-in -> ipfs://${cid_v1_base16}/path/with-insame with ipns
both ipfs and ipns protocol handlers serve content from local node (that is assumed to be running), meaning that firefox will show URLs on the left but will serve content from URLs on the right.
ipfs://${cid_v1_base16}/path/with-in => localhost:8080/ipfs/${cid_v0_base58}/path/with-in
ipns://${cid_v1_base16}/path/with-in => localhost:8080/ipns/${cid_v0_base58}/path/with-inIn a consequence to all the redirects everything works under (what I assume to be) desired origin policy where it’s either ipfs://${cid_v1_base16}/ or ipfs://${cid_v1_base16}/ respectively.
Just to be clear solution described as working only works in firefox, in electron you wound not be able to do following:
ipfs:///${cid_v0_base58}/path/with-in -> ipfs://${cid_v1_base16}/path/with-in
As API exposed for protocol handlers is passed already normalized URL, meaning hostname => cid_v0_base58
is already lower cased so there is no way of knowing what original CID was.
And here is also protocol implementations that I was referring to as a working solution:
https://github.com/Gozala/firefox-ipfs-protocol/blob/master/src/index.js
fs:/ipfs/${cid}/path/with-in/ -> ipfs://${cid_v1_base16}/path/with-in
@Gozala @diasdavid This is already pretty good, but I see two potential issues and a potential fix:
- It doesn't retain the original path. We'll run into tricky edge cases reconstructing it, as ipfs-related code will run in all sorts of situations. We should make sure to always have the original canonical path at hand (simplicity).
- The way Bitswap works probably requires converting the base16 CID back to the original form. Note that it's not neccessarily a v0 base58 CID (starting in Qm) - it can be anything, even another base16 CID. Another reason to keep the original path at hand.
From what I understand, the mapping to a whatwg-compatible URL with an origin happens purely internally, within the fs: protocol handler? If so, we can retain the whole path, and still construct the host from it.
fs:/ipfs/$cid/path/with-in => fs://$cid16/ipfs/$cid/path/with-in
(fs:/ipfs/hash and fs:/ipns/hash would generate the same origin, but that is okay I think.)
Also note that it removes the need for multiple protocols (ipfs:// ipns:// ipld:// foo://). It basically unifies the fs URI and whatwg URL we were mapping between, into one scheme which can work both in an origin-required, and in an origin-less context.
It simply defines the whatwg-compatible, originful fs URI as: set host to cidv1b16($cidFromPath)
.
What do you think, does this make sense?
On the encoding note, I think base32 would work too: https://github.com/neocities/hshca
cc @lidel too
To make a long story short, this choice depends on whether the "protocol handler redirect" in the addon is transparent, or propagates into the UX, e.g. whether it behaves as visibly as an HTTP redirect.
If the redirect is purely internal, we should go with fs://$cid16/ipfs/$cid/path/with-in
, where $cid16
is only used to have an appropriate origin, but is ignored when it comes to reading the path. It's verbose, but that's okay if it's not visible. It keeps the path consistent which would be a great win.
If the redirect is visible, we should go with ipfs://$cid16/path/with-in
(same for ipns://). It's less verbose. ("We can work on unifying the fs-db-web rift later." -- jbenet).
There are various moving pieces and I am afraid we may be doing oversimplification here, but I'd say redirect is internal, meaning there are off-the-channel transformations made by add-on before actual HTTP request to a gateway is made. At least this is what happened in legacy-sdk.
Hard to say how it will work in future.
WebExtensions do not have API for custom protocol handler yet:
- ipfs/ipfs-companion#164
- Firefox Bug 1271553 - Add ability to implement programmable custom protocol handler
- I described the need for ability to control how Origin is handled by custom protocol in Firefox WebExtension: https://bugzilla.mozilla.org/show_bug.cgi?id=1271553#c47
Going with fs://$cid16/ipfs/$cid/path/with-in
would solve origin issues, but AFAIK there is no API for rewriting address in Location Bar so $cid16
would be always visible.
Let's go with ipfs://$cid32/path/within
and ipns://$cid32/path/within
-- as discussed in ipfs/specs#152 (comment)
If we go that route, I feel js-ipfs-api
should provide CID version detection and conversion as one of its utility functions.
If we go that route, I feel js-ipfs-api should provide CID version detection and conversion as one of its utility functions.
Agreed -- copy-pasting from ipfs/specs#152:
- e.g. ipfs://$hash
- Well understood, adhering to WHATWG URL standard
- Straightforward to implement
- $hash is host/origin
- Issues:
- Hash needs to be base32 encoded, because URL hosts are case-insensitive.
- Solution 1/2: Needs a redirect to the base32 hash when pasting any non-base32 hashes.
- Solution 2/2: Probably needs a special base32 CID that retains information about the original non-base32 CID, so that we can avoid confusing UX around changing hashes.
- Doesn't retain the path, but instead needs conversion step to/from URI.
- Solution: The URI path can be derived from URL scheme and host and path.
- Hash needs to be base32 encoded, because URL hosts are case-insensitive.
Hi, I just wanted to jump in to say that @heavenlyhash was just visiting me and we discussed this issue, and what we realised, is that it is not possible to have the single slash variants like ipfs:/hash
because that's actually a valid POSIX path and NOT a valid URL. It would break web browsers to implement that, so it's just out. Try
$ mkdir -p ipfs://foo
$ realpath ipfs:/foo/
/home/timothy/Pictures2/ipfs:/foo
Note how the second slash just dissapears? Since it is not possible to have a directory with a zero length name in POSIX, //
is impossible within a path.
However,
$ mkdir -p ipfs:/foo
$ realpath ipfs:/foo/
/home/timothy/Pictures2/ipfs:/foo
can exist. So basically, ipfs:/
with a single slash would be invading the POSIX namespace and that would not be accepted by browser developers (I hope). Right now it is possible to navigate to say /usr/bin
by typing that into the adress bar, and you're not going to break that feature just for the asthetic displeasure of a single extra slash...
(@lgierth please let me know if I got this right or missed anything here)
Tackled: theory behind Origin's case-insensitivity and CIDs
I think the conclusion was that in an ideal world we would have CID/URL normalization done by browser add-on.
Add-on would detect requests with unsafe CID (case-sensitive, eg. base58) and convert them to safe ones (case-insensitive, eg. base32/16) before sending them to IPFS gateway.
This way Origin based on first segment after ://
would "just work".
Missing: Utility method in js-ipfs-api for CID detection/conversion
I did not see it being mentioned anywhere, so I assume it is missing.
Missing: WebExtension API for Programmable Protocol Handler
WebExtensions do not provide means for defining Persistent/Programmable Custom Protocol (that stays in address bar and provides proper Origin support). Firefox 54 will only support simple redirects from web+ipfs://<path>
to https://ipfs.io/<path>
, which breaks Origin barrier.
There is an open ticket at Bugzilla about need for better API for persistent/programmable protocol handler:
- Firefox Bug 1271553 - Add ability to implement programmable custom protocol handler
- I described the need for ability to control how Origin is handled by custom protocol in Firefox WebExtension: https://bugzilla.mozilla.org/show_bug.cgi?id=1271553#c47
That is all I know, hope it helps.
I'd like to contributes some input on the discussion about address schemes.
NURLs?
As far as I can tell, there is a loose definition for NURLs. I'm assuming a NURL is a "Nestable URL" and would look something like either /ipfs/<cid>/path
or /https/<domain>/path
. I'd like to see a reference to a more solid specification for NURLs as I haven't been able to find one. However, I take it that a NURL is an alternative to a URL which can be embedded within other syntaxes.
dweb vs fs
It is my opinion that neither dweb or fs is an appropriate URI scheme name. My argument is that both are arbitrary, and in the end it doesn't matter what we name them (either will still "work"). It is my opinion that the URI scheme protocol name should reflect what the address actually is: a URI scheme for NURLs. A nurl://
scheme should be introduced.
NURL URI Scheme
nurl://<origin><nurl>
This is much like the proposed fs://
scheme. However, it can be used with any NURI:
/ipfs/<cid58>/path => nurl://<cid32>/ipfs/<cid58>/path
/http/<domain>/path => nurl://<domain>/http/<domain>/path
The NURL is embedded directly in the URI scheme after a computed origin. The implementation would have full control over the origin, which leads to a decoupling of the NURL address and its origin.
CORs for NURLs
Sense NURLs are directly embedded in the scheme, cross-origin requests can be made possible. Content with an origin of <cid58-A>
can make a request for content with an origin of <cid58-B>
through this conversion:
/ipfs/<cid58-B>/path => nurl://<cid58-A>/ipfs/<cid58-B/path
Security
It must be noted that for protocols like HTTP that are subject to CSRF attacks, the same-origin policy should be enforced at the NURL URI scheme level. However, the option for an alternative policy is available for protocols like IPFS.
EDIT
See Addendum: #6 (comment)
Base 32 Encodings
Might I suggest the use of Crockford's Encoding for base32 encoded content addresses. It is my opinion that this encoding is the best choice. My reasons:
- The encoding includes all numerical digits (0-9); seems reasonable for a number encoding
- Begins with numerical digits like hex (0 is a value of zero, 9 is a value of nine, etc).
- Prefers numbers over letters (1 over I or L; 0 over O)
EDIT
Added notes regarding Base 32 encoding and Crockford's Encoding particularly here: ipfs/kubo#4143 (comment)
Addendum to the topic of NURL
I should note that a NURL is an incorrect reference to a NURI mentioned at ipfs/kubo#1678 (comment). It is worth mentioning that neither NURL or NURI currently have formal specification as I am aware.
With that said, I see no reason to choose nurl://
over fs://
as a URI scheme. In fact, the idea for a nurl://
scheme in the way that I have proposed exhausts itself in complexity. Making the scheme support existing protocols like HTTP poses challenges further. For example, a web application loaded from nurl://domain.com/http/domain.com/app
which includes a relative path to a resources (such as an image) would require the protocol and hostname prefixed in order to behave as expected. If the same web application is loaded from the http://
URI scheme, the prefixes are not wanted. This makes the nurl://
scheme incompatible with web applications built using existing URI schemes.
So as @lgierth mentioned:
If the redirect is purely internal, we should go with fs://$cid16/ipfs/$cid/path/with-in, where $cid16 is only used to have an appropriate origin, but is ignored when it comes to reading the path. It's verbose, but that's okay if it's not visible. It keeps the path consistent which would be a great win.
An fs://
scheme that unifies IPFS and IPNS protocols under one URI scheme would consequentially be subject to similar issues. Here's what I got so far:
- All applications loaded using
fs://
would require a protocol (IPFS or IPNS) and a CID in the resource paths (e.g./path/to/my-image.png
would need to be/ipfs/<CID>/path/to/my-image.png
). - We would lose the concept of a "hostname relative path" for web applications built to be distributed via IPFS.
- The
fs://
scheme could make it easy for an attacker to exploit data stored under a specific origin (cid32-sensitive) by loading content from a malicious content address (cid58-malicious):fs://<cid32-sensitive>/ipfs/<cid58-malicious>/grab-local-storage-data.html
.
Continued in:
- Migration to CIDv1 (default base32): https://github.com/ipfs/ipfs/issues/337
(solves Origin for public gateways (#89) andipfs://
protocol) - IPFS Addressing in Web Browsers: ADDRESSING.md
- Support Custom Protocols in WebExtension: ipfs/ipfs-companion#164
- Shared stewardship of the "dweb" protocol handler: arewedistributedyet/arewedistributedyet#28
dweb://{id}.{ipfs|ipns|dat|ssb|etc}
proposal (solves Origin cross-protocol): arewedistributedyet/arewedistributedyet#28 (comment)