w3c/mediasession

Dedicated video conference session API?

Opened this issue · 2 comments

Hi all,

I noticed that people are discussing adding video conferencing support to the media session API. Just curious of what are your opinions to have a separate API that is dedicated to video conferences?

The reasons are, video conferencing and normal media playing have many different features, for example,

Things that video conferencing has but normal media playing doesn't

  1. Toggle background blur/replacement
  2. Raise hands
  3. Pin the current stream
  4. toggle mic/camera
  5. Set the current speaker in the meeting
  6. Notify who has joined/left the meeting
    etc.

Things that media playing has but video conferencing doesn't

  1. (Fast) Forward/backward.
  2. Artist
    etc.

Considering that video conferencing has been a crucial application for the web, if we want to bring the best API support for video conferencing, maybe we should have a separate API. May I have your opinions about this?

Thanks!

Considering that video conferencing has been a crucial application for the web, if we want to bring the best API support for video conferencing, maybe we should have a separate API.

I think this is a good question. #269 added togglemicrophone, togglecamera and hangup arguably a bit prematurely.

If we stop there, then a more conservative approach might be to start with a MediaCaptureSession API to cover that subset only, or as I mention in #278 (comment) try to decouple and solve routing with no change in API.

Most computing devices come with camera and mic these days, so having hardware/keyboard controls for these make sense to me.

But a "video conference" is application specific — determining that "video conferencing" is happening can actually be quite hard in browsers today — Is it definable through a series of common resources or actions? Are these locked in to how people work already? Or might we be baking in assumptions about how apps work this year?

I confess I find your forward looking suggestion appealing, but pragmatically the actions still sound a bit novel and maybe context-related rather than something found e.g. in hardware keyboards already.

Thanks for the comments, Jan-Ivar. I agree this needs more thinking.

But a "video conference" is application specific — determining that "video conferencing" is happening can actually be quite hard in browsers today — Is it definable through a series of common resources or actions? Are these locked in to how people work already? Or might we be baking in assumptions about how apps work this year?

I imagine that the web apps need to explicitly tell that a video conferencing is happening by using this API. Actually this API can also be a way for the browser or even OS to decide whether there is a video conferencing going-on.

I am relatively new to MediaSession API. I am reading and thinking more on whether the video conferencing interfaces can be perfectly integrated into it. But as mentioned in my first post, because video conferencing is a very important application these days, my feeling is that it should be worth a dedicated API if that is the only way to deliver the best user/developer experience.