w3c/mediasession

Add new actions to support video conferencing websites

Closed this issue ยท 16 comments

In order to make the Media Session API more useful for video conferencing websites using WebRTC, we should add the following actions:

"togglemicrophone"
"togglecamera"
"hangup"

We would allow the UA to have default handlers for these actions if they so wish (e.g. stop the website from receiving microphone data when the user toggles the microphone off).

Since the UA doesn't necessarily know what states the microphone and camera are in, we would also need to add two new methods to the MediaSession object to get the current states:

setMicrophoneActive(boolean active)
setCameraActive(boolean active)

Here is an example usage:

navigator.mediaSession.setActionHandler("togglemicrophone", function() {
  // Handle muting or unmuting the microphone.
  // Will likely call "navigator.mediaSession.setMicrophoneActive(true|false);" at some point.
});

navigator.mediaSession.setMicrophoneActive(true);

In Chrome we currently expose UI for Media Session actions in various places, for example notifications on mobile, the global media controls on desktop, and the picture-in-picture window. We could add microphone/camera/hangup buttons to these UI when the user is in a video call on a video conferencing website that supports these actions. For example, the user could click our "hangup" button on one of those browser UIs and the website could handle hanging up the video call

Would we consider a single action for muting/unmuting microphone, something like "microphonemutechange", "inputmutechange", or "mutechange" maybe? I don't see how web developers would be interested in one action and not the other one.

navigator.mediaSession.setActionHandler("microphonemutechange", (details) => {
  // Handle muting the microphone.
  // And later...
  navigator.mediaSession.setMicrophoneMuted(details.muted);
});

Same thoughts go for turning on and off the camera video.

navigator.mediaSession.setActionHandler("cameravideochange", (details) => {
  // Handle turning on and off the camera video.
  // And later...
  navigator.mediaSession.setCameraVideoTurnedOn(details.turnedOn);
});

I'm fine with that too. My initial reasoning for having two actions is that it more closely mirrors other existing actions e.g. having both play and pause

It's unclear to me what the goal of these actions are. Could you expand on the usecases for this feature?

Sure. We currently expose UI for Media Session actions in various places, for example notifications on mobile, the global media controls on desktop, and the picture-in-picture window. We could add microphone/camera/hangup buttons to these UI when the user is in a video call on a video conferencing website that supports these actions. For example, the user could click our "hangup" button on one of those browser UIs and the website could handle hanging up the video call

I've updated the original comment to combine "mutemicrophone" and "unmutemicrophone" into a "togglemicrophone" action, and "turnoncamera" and "turnoffcamera" into a "togglecamera" action

Thanks, that makes sense. If I understand correctly, these actions will let the website react to changes made outside the content area - either browser or platform UI. In other words, events flow one way, from outside the website to the website, but not the other way?

Correct. The actions ("togglemicrophone", "togglecamera", and "hangup") all flow one way from outside the website to the website

Thanks for clarifying!

One more question: how does the UA determine which websites to deliver such notifications to? I am guessing it's gated on some kind of announcement made by the website to the browser, essentially "i'm in a call now". Could a malicious website hijack the controls and prevent users from hanging up through the browser/system UI?

Typically UI for the Media Session is specific to a tab already. For example, in the global media controls there is a set of controls for each tab currently playing video/audio. So if the user presses play/pause on a specific control, it goes to that specific tab. In the case of picture-in-picture, we know which tab the video in picture-in-picture is from, so it goes there. So a malicious website can't see or take over controls for another tab

Sounds good, thanks!

Shouldn't "togglecamera" be "turn-on-camera" or similar (or at least "toggle-camera-button"). This could be mistaken for switching between cameras.

Or maybe as these are "actions" maybe just calling it "camera" might be enough. @beaufortfrancois ?

setMicrophoneMuted(boolean muted)
setCameraTurnedOn(boolean turnedOn)

Maybe we should call it "set*Active" instead?

set*Active makes sense to me.

Re: "turn-on-camera":
My initial proposal actually used "turnoncamera" (with an additional "turnoffcamera" action), but two actions seemed unnecessary. We decided one action was better and "togglecamera" made the most sense to me. In my mind, "togglecamerabutton" doesn't make things much clearer, especially since no other media session action mentions a button.

could we link those actions to the hardware buttons on USB headsets too?