Add more detail to security section
cwilso opened this issue ยท 23 comments
Perhaps some of the detail from mozilla/standards-positions#58 (comment).
As per the linked discussion it might be a good idea to specifically make reference to the upload-firmware threat that Mozilla is worried about.
To address that specific concern, which is only an issue if the page wants to do 0xF0
prefixed SysEx messages, maybe it would be a good idea to add a suggestion that the user is prompted/warned. Other messages in the same range (e.g. 0xF1
, 0xF8
etc.) are not a problem when it comes to this particular security concern.
The current spec already covers the case.
Any sysex won't be available by default.
Sites need to call the API with sysex option, and it results in permission prompting.
Actually the spec suggests to prompt always, and Chrome has a plan to do so in months.
Thank you, yes, the current spec covers the basic case and "prompt always" is good.
However from the security perspective that Mozilla advocates it seems like the granularity of the cases in the spec is not high enough. They are worried about case (3) below where insecure user devices may be connected:
- No-SysEx use (safe).
- SysEx use but without
0xF0
custom SysEx messages (safe). - SysEx use with
0xF0
custom messages (potentially unsafe).
In my experiements the {"sysex": true}
setting is required for both (2) & (3) MIDI messages. For example to enable "Song Position Pointer" (0xF2
) messages in Chrome I had to pass the sysex
option. SPP messages belong to (2) above.
From what I am reading from Mozilla it seems like they think situation (3) is sufficiently dangerous as to warn the user more sternly and there is nothing in the spec about this concern.
In addition it seems like (2) is closer to (1) in terms of security and functionality (e.g. SPP messages are more like note-on and control-change messages than they are like custom sysex messages). Even in the MIDI spec they are not called "SysEx" but fall in the "System Common Messages" section.
I personally do not feel the security issues raised are as serious as some people indicate on that thread, but I understand their position and I would like to see MIDI support land in Firefox.
We will clarify a sub-set of safe sysex common messages and will update the spec to explain the three level of previleges. Some platform may not support the strongest vendor/product specific messages, like updating firmware via sysex messages.
@toyoshim that sounds fantastic! Hopefully it'll convince Mozilla that a safe implementation of Web MIDI is possible.
@chr15m, "SysEx without 0xF0" is inaccurate, please check out the MMA's classification of MIDI messages. I've just amended it with the respective status bytes to clarify. The MMA's position is that only category 5 is potentially harmful.
2023 TPAC Audio WG Discussion:
- Break up the current privacy/security section into a separate section. (This is a V1 blocker.)
- Review the details provided in this issue and select the ones that are relevant to the new security section. (Not a V1 blocker, but relatively easy to do)
- Create, review, and merge a PR.
Here is some more information on breaking up the privacy/security section: https://w3c.github.io/documentreview/#how_to_get_horizontal_review
From reviewing the RFCs:
I think the way I will split this is to have the privacy section focus on fingerprinting and tracking concerns, and the security section focus on everything else. The initial breaking up might require some minor rewriting as well to make everything flow properly.
I was told to also post my concerns here.
My idea was to develop a hardware synth with a web interface, but the SecureContext requirement makes this use case practically impossible. Implementing SSL/TLS in an embedded http/websocket server would add significant overhead, and dealing with SSL certificates here is entirely impractical.
I understand the concerns about sysex output, although it's not entirely clear to me how SSL improves this. For device enumeration and input though, I do believe the SecureContext requirement is overly restrictive, and I'm hoping it can be relaxed.
I'm not sure I understand your use case, could you elaborate? I'm not sure why would you have an embedded server - if you are developing a web interface for a hardware synth, you can put the web interface on a public-facing web server (even on Github); if you were putting an embedded server on the hardware synth itself, you probably don't need web MIDI (because the server could directly drive the synth).
Maybe it's a bit niche, but I thought it would be a valid use case. The idea is to have a simple http server integrated in the synth itself, serving a static html/js page. On the client side you can then connect a controller via WebMIDI and communicate directly with the synth via websockets.
True, I suppose I don't need it since you could connect the controller directly by MIDI cables. But it would be very convenient to have, and could also enable more exotic workflows (eg. collaborative use by sharing over a vpn, if latency permits).
This is the same situation when using a native wrapper library like Cordova, Capacitor, Ionic etc. Most things work on these platforms because browsers generally make an exception to the SSL rules for pages served over localhost. Chrome allows webmidi over localhost for example. If Firefox does not, it would rule out using the Firefox engine in such situations.
@jwt27 Your use case actually isn't niche at all, and I think it should be given more consideration. We have the same problem on every network connected appliance that has a web server built-in. None of the clients for these devices are allowed to use any of the more "advanced" web platform features. With Google's persistent push to prevent functionality for origins without HTTPS, all of these types of devices' capabilities are limited. What's worse is that this results in quite insecure workarounds, like requiring all your users to install software to access your synth (or your network camera, or router, or, etc.).
I've made a more general "web we want" post about this: WebWeWant/webwewant.fyi#245 (comment) Unfortunately it's been mostly ignored and I do not know how to pursue it further. A comment about your synth use case would be welcomed there, but I don't know if it's worth your time as I'm not sure posts there are properly considered.
I hear your pain trying to make a web-enabled hardware device. Self-signed certificates are annoying to your users, and any overhead in one part of the system directly takes away from the other parts.
The way the specification is written now, all of the interfaces require a SecureContext and we don't have separate input and output methods for SysEx and non-SysEx messages. I don't see a clear way to relax the specification only for non-SysEx use cases. I am also reluctant to propose major changes to the interfaces since there are already shipping implementations of Web MIDI.
If there is a specific suggestion for how to spec this out please make a pull request and the working group can review it. Please keep in mind that this use case is at least uncommon, even if not "niche", and that the current security model was considered and discussed at length.
I don't think the Web MIDI API is going to drive sweeping changes around how HTTPS is used on the Web, unfortunately. One thing to keep in mind is that even if it's not intended, it's possible to expose embedded web servers to the greater Internet. Web APIs have to assume that they will be used on the Web, and I personally feel like it's worth being a bit conservative when there is potential for abuse. But as noted above, if we can find a clean way to modify the spec which satisfies this use case I am open to it.
@jwt27 Perhaps I'm not understanding how this hardware device would be connected. I think you mean the hardware synth would have a network connection that you would use for its connection to the computer, and you would have some kind of discovery, or a hard-coded IP address on local network, and navigate to that. That seems... odd. If you're building a synthesizer, why wouldn't you make it a MIDI device (likely USB-MIDI), if only to make it easy to insert into a typical music setup (i.e. work like any other synth device)? And if you DID do that, then it's pretty trivially easy to have a MIDI loop app somewhere on the Web that connects your device (which would have a known device name) and any controller you like, and it would be trivial to have that web app live on a secure connection. (Even with a service worker, so after the first time you wouldn't need to be network-connected.)
@bradisbell you are very incorrect in painting this as Google's persistent push to prevent [powerful] functionality for origins without HTTPS; this is an industry-wide push, and as I previously pointed out, is a strong recommendation from the W3C TAG (only one member of the TAG works for Google). This is a security choice by people who have deeply explored how web security needs to work.
As an aside, I don't think it would be a good idea to regress to not requiring a secure context; at the very least this would need to be justified to the TAG (who review Web APIs).
Maybe part of the problem is that users have no say in how a "secure context" is defined. I think they should at least be able to whitelist local subnets.
And if you DID do that, then it's pretty trivially easy to have a MIDI loop app somewhere on the Web that connects your device (which would have a known device name) and any controller you like, and it would be trivial to have that web app live on a secure connection. (Even with a service worker, so after the first time you wouldn't need to be network-connected.)
Sure, all of that can be done via external services. But it would be much nicer if it was all self-contained in one box.
The Secure Contexts specification does allow user-agents to have a way to allow end users to configure a set of origins as trustworthy, although this is intended for development:
https://w3c.github.io/webappsec-secure-contexts/#is-origin-trustworthy
https://w3c.github.io/webappsec-secure-contexts/#development-environments
This doesn't guarantee that every user-agent has this ability though.
It sounds like this discussion might be expanding beyond the scope of Web MIDI. If the issue is about how secure context is defined, it's probably more useful to discuss over there since we can't change that with the Web MIDI spec (https://github.com/w3c/webappsec-secure-contexts/issues). If it is about how a particular browser or other user-agent implements secure context it's probably better to work with that project's bug tracker, for the same reason.
Of course it's fine to continue discussing here. We should consider what SecureContext is trying to protect against:
https://w3c.github.io/webappsec-secure-contexts/#threat-models-risks
The following quote is particularly relevant, since it shows that we can't rely solely on user permissions: "Granting permissions to unauthenticated origins is, in the presence of a network attacker, equivalent to granting the permissions to any origin."
Enumerating devices is a potential fingerprinting vector, so access to those lists should be in a SecureContext. But as I understand it, the Web MIDI API can't really function without first using MIDIAccess
. So I don't see how we could satisfy @jwt27's use case while keeping device enumeration secure. Am I missing something?
And if you DID do that, then it's pretty trivially easy to have a MIDI loop app somewhere on the Web that connects your device (which would have a known device name) and any controller you like, and it would be trivial to have that web app live on a secure connection. (Even with a service worker, so after the first time you wouldn't need to be network-connected.)
Sure, all of that can be done via external services. But it would be much nicer if it was all self-contained in one box.
But I'm not sure how you would put this in "one box". Your user needs to "go" to something - the best experience would probably be via WebUSB, since it could redirect to a web app upon the device being plugged in. But in your "local server" case, how would you set up that local server? Making the hardware a TCP/IP device, with an embedded web server? (I'm presuming this is what you mean, since that's the case that would be more difficult to use HTTPS.). The user experience discovering that seems like it would be less ideal; you'd need to provide network configuration and discovery (unless you have it a hardcoded fixed IP address). The only benefit of this is it would work if the entire system was not connected to the Internet. By contrast, you could either have a web URL to direct the user to and a MIDI-connected synth (what I suggested above), or you could make your hardware device a USB device that supported WebUSB (in which case it can pop up a "WebSynth detected, click to go to mywebsynth.com" dialog when the device is connected.
But I'm not sure how you would put this in "one box". Your user needs to "go" to something - the best experience would probably be via WebUSB, since it could redirect to a web app upon the device being plugged in. But in your "local server" case, how would you set up that local server? Making the hardware a TCP/IP device, with an embedded web server? (I'm presuming this is what you mean, since that's the case that would be more difficult to use HTTPS.). The user experience discovering that seems like it would be less ideal; you'd need to provide network configuration and discovery (unless you have it a hardcoded fixed IP address). The only benefit of this is it would work if the entire system was not connected to the Internet. By contrast, you could either have a web URL to direct the user to and a MIDI-connected synth (what I suggested above), or you could make your hardware device a USB device that supported WebUSB (in which case it can pop up a "WebSynth detected, click to go to mywebsynth.com" dialog when the device is connected.
IP configuration could be static or via DHCP, and the IP address would be shown on the synth's display. Or it could announce its hostname via mDNS or similar. In any case, configuration would be a one-time event, I don't see this as a major hurdle.
The Secure Contexts specification does allow user-agents to have a way to allow end users to configure a set of origins as trustworthy, although this is intended for development: https://w3c.github.io/webappsec-secure-contexts/#is-origin-trustworthy https://w3c.github.io/webappsec-secure-contexts/#development-environments This doesn't guarantee that every user-agent has this ability though.
It sounds like this discussion might be expanding beyond the scope of Web MIDI. If the issue is about how secure context is defined, it's probably more useful to discuss over there since we can't change that with the Web MIDI spec (https://github.com/w3c/webappsec-secure-contexts/issues). If it is about how a particular browser or other user-agent implements secure context it's probably better to work with that project's bug tracker, for the same reason.
Thanks, I see this discussion already exists:
w3c/webappsec-secure-contexts#60
Will subscribe there too.
Of course it's fine to continue discussing here. We should consider what SecureContext is trying to protect against: https://w3c.github.io/webappsec-secure-contexts/#threat-models-risks The following quote is particularly relevant, since it shows that we can't rely solely on user permissions: "Granting permissions to unauthenticated origins is, in the presence of a network attacker, equivalent to granting the permissions to any origin."
Enumerating devices is a potential fingerprinting vector, so access to those lists should be in a SecureContext. But as I understand it, the Web MIDI API can't really function without first using
MIDIAccess
. So I don't see how we could satisfy @jwt27's use case while keeping device enumeration secure. Am I missing something?
A possible solution is to simply remove all identifiable information. In an insecure context, enumeration could always return a "Default MIDI Device", regardless of whether one exists. Which specific device that is mapped to is then left up to the browser, eg. by prompting the user on the permission dialog.
Access to multiple devices would then still be locked behind a SecureContext, but I expect the general use case requires only a single input and/or output device.
Oops, didn't mean to close this. Sorry.