immersive-web/webxr

State of immersive mode navigation for the WebXR Draft

AlbertoElias opened this issue Β· 77 comments

With the Navigation Design Doc in mind, what is happening with regards to immersive-to-immersive navigation?

#383 was closed. Are there further efforts to get something in the draft? I think many of us are really keen on this feature to not break the power links provide while in immersive mode.

An implementation where an interstitial controlled by the UA, and the user knows it belongs to the UA using some kind of random image, shows the link you're being forwarded to would be a really good start.

I'd like to help out with this as much as possible, let me know what I can do.

We've had discussions about this at Mozilla, too. @cvan @kearwood and others have been in those chat. We were also thinking about some sort of interstitial that the user can recognize as definitely being part of the UA.

cvan commented

I can prepare a write up of the current state of navigation in the VR browsers and possible recommended UX flows (at least for Firefox Reality). @caseyyee and I have designed and implemented page-to-page navigations in VR a few times, and there are definitely lessons we learned and security and browser compatibility principles that we can apply to and propose to the WebXR specs and explainers.

Let's add this to the agenda for the next Immersive Web discussion. And I will hop on the call to get the conversation going.

@cvan I think that write up would be really useful to get things rolling. It seems like the interstitial could be the way forward and having some backing for it would be great. Looking forward to reading it.

@AlbertoElias Thanks for kickstarting the conversation. This is an essential feature for Supermedium. As a browser exclusive for immersive content (we don't render 2d pages) we need both a mechanism for pages to enter VR mode on load and preserve it on navigation. A simple approach could be:

Going through UX flows as @cvan mentioned will be an important exercise to make sure the standard enables innovation in the user agent space. I would favor though as small as possible spec that can be easily agreed upon and implemented, leaving browsers to experiment with advanced UX.

cvan commented

I have a schedule conflict for today's meeting. The discussion should be public and complete here. I will post our findings and recommendations in this issue.

@blairmacintyre and I discussed this yesterday and we both think this is an important topic to figure out soon. There are a few points we wanted to add:

We came up with two main use-cases for something like this, which could be called "UA-driven immersive sessions". I think this framing is important because it highlights both the backwards nature of the UA requesting you enter an immersive session rather than the other way around and it also highlights the connection between immersive->immersive navigation and automatic immersive entry in a case like putting your smartphone into a Cardboard or Daydream headset.

The sticky point for me is the code ergonomics of this feature with respect to session creation. It seems like we need a totally new session creation flow that is event-driven rather than request-driven. This impacts both the API shape of session creation and also how permissions might be handled and what constitutes a "user gesture".

Our straw-man proposal is that there is an event that gets issued by the UA like 'ua-driven-session' on the xrnavigator object or similar. This can be subscribed to by the page and would fire after page load in the case of immersive->immersive navigation or at any point in the case of something like entering a Daydream headset. This would present a session to the page.

More difficulties arise when we start thinking about session capabilities / constraints in this context. How do you ask for AR vs. VR? How do you distinguish between other important capabilities. The UA might give you a session you don't want or can't use. We probably need a hook for the page to tell the UA what session creation options it wants to use for such an automatic creation event.

Again, my main concerns for figuring this out soon is how it could affect the session creation API. Maybe we need an out-of-band 'configureDefaultSessionCreationOptions(...)' call, and maybe that call should be the way session creation parameters are always handled to avoid having duplicate functionality with normal session creation flow?

Blair and I were interested in others thoughts, especially from @toji

I think we can assume that a page requesting a session might fail and the same can happen with "UA-driven immersive sessions". In that case, something similar to what we have with WebVR where pages can listen to an event and at that point request the specific session (VR vs AR) they'd like and it wouldn't require reworking the current API.

A simple event sessiongranted that gives content permission to start presenting would suffice I believe. @lincolnfrog Would this cover your use cases?

navigator.xr.addEventListener('sessiongranted', function (evt) {
 // One could check for the type of session granted. 
 if (evt.session.mode  ===  'immersive-vr') {
  // Events grants permission and requestSession is not subject to user action requirement.
  navigator.xr.requestSession({ mode: "immersive-vr" }).then((session) => { ... }); 
 }
});

Scenarios where this event would fire:

  • On load after navigation where previous page was already presenting. For 2D / traditional browsers to provide in-VR navigation.
  • On load in full immersive or ar browsers where all pages are expected to present.

@dmarcos - I think your sessiongranted event is headed in the right direction. Thanks for replying!

Some questions that occur:

  1. How does this interact with normal session creation? Would this event also fire when you normally request a session? I don't generally like for there to be multiple ways of the same thing happening if at all possible.

  2. How does this work with security? If I click on a link to another immersive experience from an immersive experience, am I forfeiting my opportunity to reject permissions? What if the link is a redirect or hidden in an object or something - I may not know the domain or anything about the experience until it's already too late.

I think nell was correct in adding this to the next f2f - it's a complex topic.

avaer commented

My thoughts are mostly aligned with the idea @dmarcos proposed.

Essentially the case is "UA has preauthorized this session with this shape", which could be because:

  • we are coming from an existing session
  • the browser configuration only makes sense with that immersive session type
  • the user is in some secure context like an intranet domain and has already whitelisted permissions there

We probably need a hook for the page to tell the UA what session creation options it wants to use for such an automatic creation event.

As in Diego's proposal, I think the event itself can contain sufficient context to allow the experience to make that decision. e.g. these are the parameters in effect and you can use them as-is. Otherwise, the experience is free to request something different, which may trigger additional consent to the change.

  1. How does this interact with normal session creation?

I think this event should not fire when you start a session, but rather informs you of an existing session that should be captured.

  1. How does this work with security? If I click on a link to another immersive experience from an immersive experience, am I forfeiting my opportunity to reject permissions?

This sounds like an important UX concern for what the UA MUST do before handing out such an event. e.g. ensure that e.g. permissions can only be carried over per domain policy, or else the user must re-assent before the UA releases the event and allows session creation.

Another thing to think about is a multi-app case, in which permissions are not global but a property of a volume/prism. For that we'd need a mechanism for the UA to tell the experience what "session policy" is in effect, and give the experience the chance to bind to it or renegotiate a different one with the user.

Thanks for the feedback @lincolnfrog

How does this interact with normal session creation? Would this event also fire when you normally request a session? I don't generally like for there to be multiple ways of the same thing happening if at all possible.

There would be a unique way to request a session: navigator.xr.requestSession. The function has to be called in the context of a user action:

buttonEl.addEventListener('click', function () { navigator.xr.requestSession(...)... });

Or UA initiated:

navigator.xr.addEventListener('sessiongranted', function () { navigator.xr.requestSession(...)... });

How does this work with security? If I click on a link to another immersive experience from an immersive experience, am I forfeiting my opportunity to reject permissions? What if the link is a redirect or hidden in an object or something - I may not know the domain or anything about the experience until it's already too late.

I would include some text in the spec: The UA MUST communicate users there's a navigation event happening I would let UAs explore the UX specifics: Interstitial, Overlays, Permission dialogs...

@dmarcos I like the direction of this above, and it seems to handle most of what I was imagining.

In particular, I was imagining that (as you described) the event would contain information describing the session, and while the app is free to request a different session, that will potentially fail (the UA may say "nope" if the user has requested the app start with a specific session, for whatever reason) or it may require user permission. Ideally, the event data field (the event.session.mode in your code snippet) might map directly to the parameters that could be passed to request session, making the "easy case" simple.

@lincolnfrog I imagine that the logic in the event listener would largely be focused on checking the properties of the session before requesting it, but the promise result for all requestSession calls might just call the same wrapper method.

Regarding security: we've chatted about this a bunch at Mozilla, and as @modulesio suggests, we will want to document expected/required behaviors. For example, when following a link across domains, the UA should probably present some not-easily-spoofable interstitial that presents the user with information about the target of the link, and an opportunity to change the properties/permissions granted to the existing session. How that is managed ... I don't know. The brute force method is to drop back to the "non-immersive home" and display a popup on a web page; but other methods might be devised.

What @dmarcos proposes is also pretty aligned with what I was thinking and just giving examples of ways for UAs to notify users that a navigation event is taking place is enough for the first version of the spec.

Ideally, the event data field (the event.session.mode in your code snippet) might map directly to the parameters that could be passed to request session, making the "easy case" simple.

Yes please!

Pondering this, I would like to raise one issue with the direction of this proposal.

I would like to see us create an API that (as @NellWaliczek and @toji like to say) leads developers into a pit of success. In this case, that means that the most obvious way to implement a WebXR page will work properly in all contexts.

The "event" mechanism here does not do this. If a page doesn't bother to register for "sessiongranted" (or whatever we call it) events, and then handle them properly, it will not support smooth navigation or UA's the give user's the ability to start immersive apps via some mechanism outside the page itself.

In Argon, we followed a pattern more closely aligned with current 2D systems: we decoupled the "request" for a session from the "notification" of a session being created.

I've mentioned this before, but will offer it up again. I would prefer to get rid of the promise associated with requestSession and instead do something like

navigator.xr.addEventListener('sessiongranted', function (evt) {
   // One could check for the type of session granted. 
   // Events notifies of session creation after navigation, UA action, or requestSession.  
   // The session object is provided as part of this event.
   if (evt.session.mode === 'immersive-vr') {
      // set up app state for immersive vr, if that's what the app wants
   } else {
      // notify user that this app only works in immersive vr mode, if desired
   }
}

[...]
  // on some user action, request a session ...
  navigator.xr.requestSession({ mode: "immersive-vr" }).then(() => { 
      // the promise is used to notify that the request succeeded or failed, 
      // or if something is wrong resulting in an error
      // The actual session is provided by the event above 
     [ ...] 
  });

The argument against this has been that it decouples the session creation into two bits of code (registering for the event and receiving the session), but I actually view that as a good thing. The UA is free to give the user more power, including eventually allowing sessions to be changed, started/stopped, etc., without the app having to manage this. The app will just receive new sessions when they are created.

I realize there are pros and cons to both, but I wanted to remind people of this option.

@blairmacintyre Thanks. Passing the session to the sessiongranted event makes total sense.

I don't see how this proposal satisfies the points that @toji made in the Navigation Design Doc.
If there is no browser chrome, how can a user be certain that they are going to the correct site?

avaer commented

I think Diego touched on the notification point ("The UA MUST communicate users there's a navigation event happening"), but this seems a platform-specific concern that goes beyond the notion of chrome.

For example, although some kinds of display notifications could be spoofed...

  • the UA can present personalized identification that the app would not know
  • the trust anchor could be user-initiated, such as a trusted button to go out to a "spatial HTTPs bar" which users are trained to expect on the platform
  • the trust anchor could be outside the headset entirely, if it makes sense on the hardware

The problem seems broader than which chrome pixels there are, so I'd propose language that puts the onus of the solution on the UA.

I think @modulesio hits in on the head. I know @cvan and others at Mozilla have thought a lot about this, perhaps someone can find some of the design documents and experiments we did, and share them?

@rcabanier I covered some of that ground in #517 (comment). Happy to elaborate specific concerns. To sum up:

  • Include in spec that UAs must clearly communicate navigation events in a non-spoofable way. Users at anytime can request the current location.
  • To prevent spoofing: Supermedium reserves one of the buttons long press to invoke the browser UI. Content cannot map that input. As @modulesio mentioned, another way is requesting personal information on setup: choose an image, 3D object, avatar... This is not accessible by content and shown in the User Agent interface.
  • I would leave UAs to choose, select, invent the specific trust mechanism. Probably too premature to spec prescriptive UX in this exploratory phase. Some guidelines, example mechanisms could be included.

I think @modulesio hits in on the head. I know @cvan and others at Mozilla have thought a lot about this, perhaps someone can find some of the design documents and experiments we did, and share them?

Yes, it would be good to have that information.

cvan commented

On it - will post here. Thanks, everyone, for keeping this conversation going.

@rcabanier I covered some of that ground in #517 (comment). Happy to elaborate specific concerns. To sum up:

  • Include in spec that UAs must clearly communicate navigation events in a non-spoofable way.

Can you give an example how this could be done?
Since the author has full control of the display, it seems this can be spoofed.

Users at anytime can request the current location.

I don't think users are going to click a button to see if they are on the correct site. It needs to be obvious and always there.

  • To prevent spoofing: Supermedium reserves one of the buttons long press to invoke the browser UI. Content cannot map that input. As @modulesio mentioned, another way is requesting personal information on setup: choose an image, 3D object, avatar... This is not accessible by content and shown in the User Agent interface.

Can you elaborate? Will the new site show this information and thereby "prove" it is non-spoofed?

  • I would leave UAs to choose, select, invent the specific trust mechanism. Probably too premature to spec prescriptive UX in this exploratory phase. Some guidelines, example mechanisms could be included.

It sounds like Mozilla already shipped a version and @cvan is going to post it here.

@cabanier Thanks for the questions.

Can you give an example how this could be done?
Since the author has full control of the display, it seems this can be spoofed.

An example we explored: On navigation the UA could show an interstitial displaying the target URL. Giving an opportunity to the user to interrupt the transition and go back to previous site. The interstitial would display user selected personal information that is not accessible by content.

I don't think users are going to click a button to see if they are on the correct site. It needs to be obvious and always there.

That would be complementary to the above. Since there's not UA user interface always present the user has a trusted way to invoke it anytime.

Not sure that an always present UI in immersive content (particularly in VR) is going to yield a good experience.

Can you elaborate? Will the new site show this information and thereby "prove" it is non-spoofed?

The personal information is shown by the UA UI (in interstitial, overlay or modal UI invoked by the user) and content has no access to it. It's a guarantee that the UI is legitimate and UA is not spoofed by a malicious site.

To prevent a site masquerading another site, the UA provides the target URL via interstitial, overlay or user invoked UI.

I'd be careful about guarantees. According to this study (pdf), 73% of users still entered passwords when security images were missing. The UA should of course do its best to allow attentive users to stay secure, but finding an approach that is both reliable and unobtrusive seems very difficult.

A dedicated UA button that can't be intercepted can of course help. We could require using that button to confirm switching sites, but that would make the transitions clunkier.

I'd be careful about guarantees. According to this study (pdf), 73% of users still entered passwords when security images were missing. The UA should of course do its best to allow attentive users to stay secure, but finding an approach that is both reliable and unobtrusive seems very difficult.

I agree. Even though we all want seamless transitions, it will be hard to do this in a way that users are accustomed to.

A dedicated UA button that can't be intercepted can of course help. We could require using that button to confirm switching sites, but that would make the transitions clunkier.

Even a button can be spoofed :-(
A site could pretend to navigate to your bank...

Even though we all want seamless transitions, it will be hard to do this in a way that users are accustomed to.

Definitely true, but this could be a reason for UAs to start tackling the problem and start accustoming users to potential solutions. The alternative makes me think of Windows Vista UAC rollout...

Even a button can be spoofed :-(

As above, there are some that would be hard to spoof, by knowing things the page can’t know, or by being physical. Though I wouldn’t advocate for the spec prescribing something as specific as buttons yet.

FWIW, another variation of this we stumbled on. Modern browsers have the ability to "click" to follow a link, but also menu items to "open in new tab" or "open in new page".

If you see a link in an AR/VR display while not in immersive mode (perhaps on a 2D page, perhaps sitting in a virtual home or attached to the wall of your real-world room ... whatever), it seems reasonable to envision "open in immersive mode" (or whatever), that follows the link while going into immersive mode immediately. One could also imagine that the AR/VR equivalent of saving a web app to your home screen on a phone is to save it to you "world" and check a box that says "always open into 3D immediately", for example.

Regardless, the idea that you follow a link while immediately launching into immersive mode seems to fit in here:

  • you start in something that isn't WebXR immersive mode (2D web mode or platform-desktop mode)
  • the action of launch explicitly gives permission to go into immersive mode
  • from the page viewpoint, it is launched in immersive mode and gets an event (if we follow the above suggestions)

So, the receiving page follows the same protocol as immersive navigation, but the user has given permission as part of launching it.

And here's another scenario, which I just imagined as I was cruising the web.

I've got a website with lots of pages, let's say I'm a small company like Amazon or Ikea. In my backend system, I can create immersive versions of each page. A "random room" with the Amazon product on a nice table, or a predefined room showcasing each Ikea product.

When I visit a page in 2D, I can enter the immersive version. Or, perhaps I've entered an immersive room directly from another immersive experience (someone linked to an Ikea chair somewhere else in the multiverse.)

Within each of these "rooms", there are links to related products. Every object in the Ikea rooms is made by Ikea, and has it's own page on their website. In the Amazon rooms, there are shelves with "the objects people bought after looking at this one" and "sponsored products".

Following these links leads to different environments.

The interesting question I have here is this: do we REALLY want to have to reload all assets when we visit the different rooms? For example, Perhaps Amazon would really like to have the room structure stay the same, but the products disappear and new products appear? Or, Ikea may be happy to sometimes keep the same room when you click on another product; sometimes, they might want to swap it out.

Doing these kinds of things would be easy if it was "one web app and you never follow links". But how will this sort of experience be supported in WebXR?

The core feature of the web is the URI. I want to be able to save a link directly to an interesting room in Ikea or place/product on Amazon, just like I do now.

How's this going to work??

@blairmacintyre I believe no changes are required to accommodate your use case. If there's navigation the mechanism would be the same described above (sessiongranted event). To preserve assets load one could implement a single-page application that allows changing the url without server side request via History API.

Regardless, the idea that you follow a link while immediately launching into immersive mode seems to fit in here:

  • you start in something that isn't WebXR immersive mode (2D web mode or platform-desktop mode)
  • the action of launch explicitly gives permission to go into immersive mode
  • from the page viewpoint, it is launched in immersive mode and gets an event (if we follow the above suggestions)

So, the receiving page follows the same protocol as immersive navigation, but the user has given permission as part of launching it.

This again suffers from the same issue that a user won't know if they can trust the page.
If someone sends me an email with a pretend link to my banking site and if I open it directly to immersive, I won't have a way to find out that this is a spoofed site.

Doing these kinds of things would be easy if it was "one web app and you never follow links". But how will this sort of experience be supported in WebXR?

The core feature of the web is the URI. I want to be able to save a link directly to an interesting room in Ikea or place/product on Amazon, just like I do now.

Ikea/Amazon could create a single page web application that allow traversal of the site.
If the user finds or creates an interesting room, the site can create a unique URL that the user can save.

This again suffers from the same issue that a user won't know if they can trust the page.
If someone sends me an email with a pretend link to my banking site and if I open it directly to immersive, I won't have a way to find out that this is a spoofed site.

I don't think your analogy works. If you are looking at a message or email with a link to the spoofed site, and you explicitly take an action to launch the URL in a new page, the page will load and run -- with current web technology. Your action gave the browser explicit permission to open the page. It doesn't give it permission for other things (location, camera, etc), but it will load and run the page.

I was suggesting an explicit action in a browser ("Open in Immersive Mode"), akin to using the context menu on a link to say "Open in New Page" or "Open in New Tab". If someone sends me a link that is going to do something nefarious in immersive mode, I don't see how it could be made safer by forcing the page to first open in 2D mode (where, obviously, the nefarious page could present whatever spoofed 2D page, or just a "enter immersive mode to continue" button), and then requiring the user to click on the button. Both are a single action they take.

The "Open in Immersive Mode" could have whatever UI a browser wants, of course. One could show a popup ("Confirm you want to go to URL XXXX in immersive mode: yes / no / back").

@cabanier

To preserve assets load one could implement a single-page application that allows changing the url without server side request via History API.

@dmarcos

Ikea/Amazon could create a single page web application that allow traversal of the site.
If the user finds or creates an interesting room, the site can create a unique URL that the user can save.

Sorry I wasn't clear. If a company is willing to completely redo their site, including changing it's fundamental architectural approach, to support XR, then we obviously don't need to do anything.

I'm assuming that for most companies, that's not really their first/preferred option. Perhaps Ikea/Amazon (big companies that hypothetically have infinite resources to throw away?) are not good examples. Consider, instead, a museum (which a site that has a page-per-item) or a small mom-and-pop store using some traditional CMS like Wordpress that wants to drop in a "XR plugin" that they can configure per page.

I'd also posit that there are other reasons for actually loading new pages, such as the fact that most browsers do a pretty bad job of managing memory and avoiding bloat on active pages over a very long time, so page transitions still represent a nice way to clear out the entire state and "start over".

I'm not suggesting there is an obvious solution here. But "a company that wants to do this simple thing could just completely rearchitect itself" seems like a pretty unsatisfying answer. πŸ˜‰

@cabanier Thanks for the input.

Navigation security concerns aside. The sessiongranted event described above covers other important use cases. Supermedium as a VR only browser (no 2D rendering) needs a standard way to grant immersive mode at page load (similar to the old WebVR vrdisplayactivate event). The sessiongranted event will also enable the start in immersive mode scenario that @blairmacintyre described. The same mechanism is also forward compatible and can be used for in-VR navigation once there's a security model everyone feels confortable with. At first, the spec would constrain sessiongranted to UA initiated immersive sessions at page load and not navigation.

This again suffers from the same issue that a user won't know if they can trust the page.
If someone sends me an email with a pretend link to my banking site and if I open it directly to immersive, I won't have a way to find out that this is a spoofed site.

I don't think your analogy works. If you are looking at a message or email with a link to the spoofed site, and you explicitly take an action to launch the URL in a new page, the page will load and run -- with current web technology. Your action gave the browser explicit permission to open the page. It doesn't give it permission for other things (location, camera, etc), but it will load and run the page.

I believe this is a good analogy. If I browse to a spoofed website, I will be able to see in the URL bar that the site is the wrong URL. There is no such thing in WebXR.

I was suggesting an explicit action in a browser ("Open in Immersive Mode"), akin to using the context menu on a link to say "Open in New Page" or "Open in New Tab". If someone sends me a link that is going to do something nefarious in immersive mode, I don't see how it could be made safer by forcing the page to first open in 2D mode (where, obviously, the nefarious page could present whatever spoofed 2D page, or just a "enter immersive mode to continue" button), and then requiring the user to click on the button. Both are a single action they take.

Users have been told what to watch for so they can recognize untrusted websites. (For reference, my 8 year old son had a class that told him to look for the green lock and explained what the warnings meant)
So even though 2D has its issues, at least people are familiar with the pitfalls.
I'm sure that there are entire teams at the browser vendors that agonize over each minute user interaction.

The "Open in Immersive Mode" could have whatever UI a browser wants, of course. One could show a popup ("Confirm you want to go to URL XXXX in immersive mode: yes / no / back").

If the user has to deal with a popup anyway, how is a button to go immersive worse?

believe this is a good analogy. If I browse to a spoofed website, I will be able to see in the URL bar that the site is the wrong URL. There is no such thing in WebXR.

I agree there are differences, but I think we're quibbling here. By the time you see the URL, you've loaded and run the site; and, as I said in my first reply, you are right about this, and we could (in theory) should you the URL before entering immersive mode during the link following.

If the user has to deal with a popup anyway, how is a button to go immersive worse?

Having a browser-chrome-level dialog asking for confirmation would be faster (don't need to load and run the site) and safer (don't need to load and run the site).

I wouldn't dream of suggesting that this is something that could be defaulted or done without an explicit user request.

Anyway, all of this is "browser level UI" discussion; any given browser would be free to handle as they desire.

The underlying mechanism to support it is the thing we need to put in WebXR (the ability to launch into a site in immersive mode) and that's what I'm really arguing for.

Here, I just saw a specific small case that could be implemented safely, in practice (via browser level dialogs) since the interactions starts in 2D mode.

I'm sure that there are entire teams at the browser vendors that agonize over each minute user interaction.

Indeed.

believe this is a good analogy. If I browse to a spoofed website, I will be able to see in the URL bar that the site is the wrong URL. There is no such thing in WebXR.

I agree there are differences, but I think we're quibbling here. By the time you see the URL, you've loaded and run the site;

Yes but that site won't be able to do anything while the user recognizes that it's not the correct site.

and, as I said in my first reply, you are right about this, and we could (in theory) should you the URL before entering immersive mode during the link following.

If the user has to deal with a popup anyway, how is a button to go immersive worse?

Having a browser-chrome-level dialog asking for confirmation would be faster (don't need to load and run the site) and safer (don't need to load and run the site).

Would the previous site keep displaying the last scene or would there be some sort of loading animation?
I have not tried supermedium or FF reality. How are transitions handled?

I wouldn't dream of suggesting that this is something that could be defaulted or done without an explicit user request.

Anyway, all of this is "browser level UI" discussion; any given browser would be free to handle as they desire.

The underlying mechanism to support it is the thing we need to put in WebXR (the ability to launch into a site in immersive mode) and that's what I'm really arguing for.

I'm worried that certain implementors are going to use this feature to allow browsing from site to site with no security checking, UA specific security checking or with some sort of UA managed whitelist. (From last week's meeting it sounded that FF Reality and Supermedium are already doing this.)

This is going to lead to broken experiences for users as some sites will work on FF reality but not on Chrome.

We already have lots of WebVR and WebXR sites that only work with certain UA's, hardware or controllers so let's not make it worse. Does Supermedium content even work in any mainstream browser?

I'm unaware of Firefox (Reality or otherwise) supporting such navigation in WebVR, but I'll ask. (What gave you this impression?)

Certainly, browser implementers can "do the wrong thing" in any number of ways, which is why the major browser vendors have security teams looking at just these sorts of things. We've certainly spent a lot of time talking internally about this issue of immersive navigation.

Right now, a browser could give a page access to anything it wants without user permission, but they don't. Why is this different?

@cabanier @blairmacintyre Firefox, Oculus Browser, Samsung Internet and Supermedium have had in-vr link traversal for a couple of years in release channels. All based in the WebVR vrdisplayactivate event designed for navigation among other purposes.

@cabanier Supermedium is a standard browser and fully compliant with WebVR. We curate existing WebVR content and make it easy to discover. You can access our directory from any browser: https://webvr.directory/

I have not tried supermedium or FF reality. How are transitions handled?

Links are currently instant. You see the Steam or Oculus loading screen, and then you are in another VR world in a few seconds.

There currently is no interstitial, but that is because today's content ecosystem does not necessitate it yet, but of course we'd implement it once use cases arise. All current VR and WebVR content is experiential and not based on banking or shopping. As maintainers of A-Frame and having done 155 weeks of weekly content roundups of the WebVR content ecosystem, https://aframe.io/blog/, I have a pretty good idea where WebVR content is at today, and it's pretty far from heavy security-necessitating use cases.

Of course as we start to support those use cases and the spec solidifies, security UI and UX are very solvable with solutions mentioned above. You can try Supermedium or FF Reality, it helps to use the VR browsers today to get a feel for the state of things. But I feel not having link traversal will hinder WebVR enough to ever evolve to those use cases.

I'm worried that certain implementors are going to use this feature to allow browsing from site to site with no security checking.

We can get the low-level feature in first, it's just an optional event that already exists in WebVR. I don't think UI needs to be block the spec, but perhaps we can write up possible security recommendations into the spec for link traversal with your input (trusted interstitial with some non-spoofable info, allow trusting of domains, etc). I believe the small circle of implementors of the spec have security in mind.

This is going to lead to broken experiences for users as some sites will work on FF reality but not on Chrome. Does Supermedium content even work in any mainstream browser?

An optional event that can be listened to in case a site wants to auto-enter VR would not segment the VR Web. The vrdisplayactivate event today is very non-intrusive. There is also no concept of "Supermedium content", it's just a client that makes it easier to surf WebVR content that already exists. I recommend trying out Supermedium and FF Reality or other VR browsers, it will provide a better feel of the landscape of link traversal, VR browsers, and the VR Web today.

We should have the bare minimum low level hook to enable link traversal so that the space can be explored and iterated upon. For use cases off the top of my head:

  • Less performance worries. The Ikea or Amazon case is one instance. Having different web pages load is sort of beneficial for not having to deal with offloading / onloading new models and environments or making your own interstital. You can avoid frame drops by letting the browser handle the transition, and VR web pages load quick enough.
  • You have a game that contains links to your other games. Or links to your friends' games.
  • You have a personal blog that links to affiliated VR sites or your friends blogs.
  • A teacher has an educational website for a class, with a link to the school's portal.
  • User generated content linking to one another. Like Rec Room, VR Chat.
  • Sponsored links. You have a VR world about photography and filmmaking that links to Photorama, B&H, GoPro's VR portals.
  • Content portals in general (directories of games, immersive videos, collections).
  • You have a personal storefront that links to products hosted by eBay VR or Amazon VR or whatever.

I wonder if there is something of CORS / CSRF that can be leveraged here.

@cabanier Supermedium is a standard browser and fully compliant with WebVR. We curate existing WebVR content and make it easy to discover. You can access our directory from any browser: https://webvr.directory/

How is supermedium fully compliant with WebVR but not supporting HTML?
From the spec:

If requestPresent() is called outside of an engagement gesture, the promise MUST be rejected unless the VRDisplay was already presenting.

How do you enter an immersive session if there is no button to click?

How do you enter an immersive session if there is no button to click?

@cabanier Sorry, I misworded. I meant that standard WebVR experiences work and don't have to target Supermedium. There's no proprietary APIs. You're right, we don't honor the initial user gesture. It's a browser UX decision that does not affect content. It's common practice like for instance mobile UAs not complying with video autoplay to save battery and data.

How is supermedium fully compliant with WebVR but not supporting HTML?

There's no API for WebGL pages to display HTML. All WebVR content is subject to that constraint.

How do you enter an immersive session if there is no button to click?

@cabanier Sorry, I misworded. I meant that standard WebVR experiences work and don't have to target Supermedium. There's no proprietary APIs. You're right, we don't honor the initial user gesture.

So you are not compliant with WebVR :-)

It's a browser UX decision that does not affect content.

Maybe not content but it certainly touches privacy/security. If WebVR/XR suddenly becomes successful and people use FF reality or Supermedium to visit their banking or social media website, they will have no way of knowing that they can trust what they see.

It's common practice like for instance mobile UAs not complying with video autoplay to save battery and data.

Sure but UAs wouldn't do it to make an endrun around a security feature.
Also, a spec should allow for this type of behavior.

How is supermedium fully compliant with WebVR but not supporting HTML?

There's no API for WebGL pages to display HTML. All WebVR content is subject to that constraint.

I think this means that you can't implement the spec since you have no "engagement gestures". I don't know how you can resolve this...
You should probably offer sites a custom API (and use a whitelist?) instead of ignoring parts of the spec.

Maybe not content but it certainly touches privacy/security. If WebVR/XR suddenly becomes successful and people use FF reality or Supermedium to visit their banking or social media website, they will have no way of knowing that they can trust what they see.

As mentioned, current generation of content does not require high levels of security. As banks and shopping get on board with VR, it's easy to introduce any of the mechanisms described above. Incentives are already aligned. Insecure browsers won't survive long in the market.

Supermedium, at the moment, has a reserved interaction that cannot be mapped by content to invoke the browser UI and obtain information about the current site. This will evolve based on devs and user feedback.

I think this means that you can't implement the spec since you have no "engagement gestures". I don't know how you can resolve this...
You should probably offer sites a custom API (and use a whitelist?) instead of ignoring parts of the spec.

That would indeed fragment the Web

I really think we shouldn't get into a browser war here. I'm sure all of us are perfectly aligned on the importance of security.

This feature doesn't make it insecure, we have already provided different ideas that UAs can experiment on for security, and I think we all agree these should be mentioned on the spec.

This is a feature that's critical for some UAs and for many content creators as well, and for our users as we want them to have the best UX possible. It's a simple optional addition for content creators with many use cases like @ngokevin demonstrated.

@AlbertoElias agreed, thanks for chiming in.

@cabanier I'm not quite sure why you keep saying that Firefox Reality is somehow going to be insecure for WebXR. If you have specific concerns about our plans, I (or others on the team) would be happy to have a discussion with you offline.

The current implementation of an agreed-on-at-the-time WebVR behavior has nothing to do with our plans for WebXR going forward. And our plans for how the UA will behave are (as @AlbertoElias points out) very much concerned with security and privacy of end users.

To recap: the ability to navigate between URLs is a basic requirement that needs to be sorted out. It is quite clear that the UA needs to ensure user's understand what is happening and give informed consent. There are a variety of ways this can happen, from always popping out of the immersive experience during link traversal, to integrating with the underlying platform in other ways the ensure the user's understand what's happening. But if we are going to give UAs the ability to support link traversal at all, some navigation mechanism must be defined.

Can I make a motion to open a proposal repo on this? This thread has become quite long.

It seems a proposal should deal with two things:

  • define the mechanism for navigation and link traversal (i.e., like the events discussed above)
  • lay out the responsibilities of the UA in terms of user understanding and consent, if they choose to implement it.

If anyone disagrees that we need to have a method to support navigation, please say so. If you agree that there is sufficient agreement on having this capability and we should move this to a proposal, please say so as well. We need closure on this.

I definitely agree

I agree. I’m happy to volunteer to create a repo with an explainer and API proposal based on the event discussed above.

I really think we shouldn't get into a browser war here. I'm sure all of us are perfectly aligned on the importance of security.

I don't understand why you say this is a browser war.
This is about implementing a spec correctly so users and authors are protected

This feature doesn't make it insecure, we have already provided different ideas that UAs can experiment on for security, and I think we all agree these should be mentioned on the spec.

Experimenting is fine. Putting an unrefined/ill-defined feature in the spec is not.

@cabanier I'm not quite sure why you keep saying that Firefox Reality is somehow going to be insecure for WebXR. If you have specific concerns about our plans, I (or others on the team) would be happy to have a discussion with you offline.

I think I voiced specific concerns several times in this thread.
Responses so far were

  • we already had it in WebVR so what's the big deal
  • we're going to solve it with some sort of completely new UI on the platform
  • current content does not require security

The current implementation of an agreed-on-at-the-time WebVR behavior has nothing to do with our plans for WebXR going forward. And our plans for how the UA will behave are (as @AlbertoElias points out) very much concerned with security and privacy of end users.

To recap: the ability to navigate between URLs is a basic requirement that needs to be sorted out. It is quite clear that the UA needs to ensure user's understand what is happening and give informed consent. There are a variety of ways this can happen, from always popping out of the immersive experience during link traversal, to integrating with the underlying platform in other ways the ensure the user's understand what's happening. But if we are going to give UAs the ability to support link traversal at all, some navigation mechanism must be defined.

@toji 's excellent explainer doc very specifically went over the pitfalls of this navigation problem. If you can find a solution that addresses Google's concerns that would be great.

we already had it in WebVR so what's the big deal

This is a pretty blunt and strawman-like form of summarization. I think you may be trying to summarize a point that many of us have had experience with link traversal in production, and are using that experience to propose what it might look like in a new spec.

we're going to solve it with some sort of completely new UI on the platform

Well, yes, VR browsers will need new UI.

current content does not require security

This leaves out everything else that has been said. There have been maybe half a dozen different, valid, and implementable proposals on what security on link traversal might look like. Web spec shouldn't infringe upon that high of a level. From the official WebXR explainer doc:

"Non-Goals: Define how a Virtual Reality or Augmented Reality browser would work"

And also stated in the official WebXR explainer doc The web’s transient nature makes these types of applications more appealing, since they provide a frictionless way of viewing the experience. Link traversal provides frictionless browsing of VR experiences without fumbling through HTML with controllers.

Another reason left out is the many use cases of link traversal that have been listed. Before you were concerned that link traversal would fragment the Web. But we've seen first-hand that not having link traversal / session granting has fragmented the VR Web. Half of content requires a 2D browser UI / portal to enter VR and half does not. It also puts every piece of VR content on their own island akin to an app store if they can't interlink.

excellent explainer doc very specifically went over the pitfalls of this navigation problem. If you can find a solution that addresses Google's concerns that would be great.

Most of @toji's concerns in the navigation explainer doc either are very solvable (potentially through solutions specifically listed several times in the thread), making assumptions about the future, are present on the 2D Web today (e.g., spoofing, spam, ads even with all security features), or are not worth restricting the VR Web. An event to grant a VR session is not going to be the gateway for spoofing, spam, and ads.

There are many viewpoints and input to consider from several angles (content producers, users, implementors, headset makers, library authors). I'm happy those concerns are listed so we can delve into how address them or debate the validity of them. We shouldn't kill links on the Web's 30th birthday, yeah? On topic of having an official proposal / explainer doc for navigation, I'm in favor and we can hash out the details there. To note again, in the official WebXR explainer doc:

"Non-Goals: Define how a Virtual Reality or Augmented Reality browser would work."

n5ro commented

I have spent months writing Webxr to create this XR Magazine see video preview here https://www.facebook.com/worksalt/videos/2694567600569870/ The present and future XR Journalism profession would suffer tremendously if we did not have a way to link new articles as new pages in our webxr sites. That's how an online XR Magazine is built these days. Each article has its own page with its own memory constraints, and its own stats like traffic monitoring. Each page includes links to previous articles like a nested doll so new readers keep finding previous articles and the publication just grows and grows. We need to be able to link not only nested trees of articles but also endless infinite worlds across massive servers containing hundreds and thousand of pages spatially linked together but individually representing only about the amount of content that the average webpage can handle as far as page memory goes. In addition users need to be able to walk with each other across these pages between these pages, with their virtual avatars, following each other from page to page and meeting up with each other in various locations, perhaps working together to solve a puzzle that has been spread across many pages by the magazine's authors. In my opinion a magazine ought to be able to get the user disclaimer about link traversal on the Magazines own website inadvance upfront, so that the user is notified that they may be traveling through many pages on the Magazines website as part of the intended experience and that the user will ONLY be notified if they are presented with a link that will result in them leaving the current page which is what Facebook does "if you follow this link you will be leaving Facebook" As long as the page itself isn't hacked and isn't owned by a malicious actor and is secure then there should be no security issue with this arrangement. A bank could easily follow this structure, notifying users if they are about to leave the banks server, but not notifying users about link traversal if its another page on the Banks's secure website.

n5ro commented

another thought is that if an XR website is secure AND it has a certificate and it is trusted, then what is the ethical obligation if any to notify the user of link traversal? lets say that you have an XR mall of multiple trusted partners, something like supermedium but https, and certificates, and maybe an address bar on the controller that allows a user to add a plug in that notifies them of link traversals or popup blocks or allows the user to bookmark something, a nav bar built into the VR controllers that can be modified by browser plugins would be awesome. Imagine that everything thats on top of 2D firefox/chrome was built into the webxr controller, modifiable by the user, independent of the page.

n5ro commented

In the case of handtracking the address bar and add-ons and notifications might be hidden under the hand or on top of the hand by user preference

n5ro commented

I think it would be great if the browser managed controllers independent of the page but that users could download my add-ons to upgrade their controllers with my customizations and menu options, the add-ons would be available in a pre-screened app store like add-ons are currently

I think there are valid points from all sides on this issue. We built a link traversable meta-verse built with blockchain/ipfs to provide a reputation based security mechanism. People won't understand these new technologies are web-based, they will download or use what ever is most convenient for them. You can't build a spec that creates a terrible user experience and expect your clients, the users of the web at large, to vote it by using it. Your only choice here is work to provide intuitive user-centric UI to facilitate these traversals with acceptable security or they will adopt other ways that can easily be less secure. After all, the dApp that runs dlux.io/vr only uses public API and runs with out credentials anywhere with node.js ... a step away from being a new "browser" ... as I see it your only option is to build the best one.

@disregardfiat those are good points. If the web doesn't do what users want, they aren't going to use it.

My second worry is that, if we don't define a navigation approach, then browsers are going to implement something without having a larger discussion about what's best. We'll end up in a situation like the never-standardized,-now-deprecated motion APIs; people will build who web ecosystems based on something that eventually goes away and breaks a lot of things.

Or worse, folks will just start using things like Exokit; it is very appealing, has great features, and a lot of people yammering about how it's "the way the immersive web should work". But, it has no security, giving apps complete access to the underlying platforms. It's great for prototyping, of course, but I'm amazed at the number of folks who've argued to me that it's the way the immersive web should be built.

@blairmacintyre I completely agree.
Let's get a proposal out that covers the navigation behavior.
I think we don't need to discuss an API yet. (It could be as simple as a <meta> to keep the session going and allowing requestSession under certain circumstances.)

My second worry is that, if we don't define a navigation approach, then browsers are going to implement something without having a larger discussion about what's best.

My main worry is that the group decides on something insecure that will never be implemented by a mainstream browser.

I'm unsure if people want me to respond to all their messages. Please ping me if so.

@cabanier I understand your concerns. I feel there’s already pretty good alignment with the navigation document, API proposal and above discussion. I would love to hear what Google folks think.

Having an API keeps the conversation focused and helps moving towards shipping. Timing is good. There’s tons of experience we’ve gathered since WebVR was first drafted and implemented almost 5 years ago.

n5ro commented

from reading various threads on webxr on github there seems be a small number of people who fear that there are whole new vulnerabilities opening up in webxr that do not exist in the traditional web such as the idea that someone could fake your web experience so that you are logging into your bank when you think you are playing a game. Another fear is that in social VR situations a hacker could sit in on your private VR chat as an invisible observer. However I think that if folks really think about these potential vulnerabilities on a deep level they will eventually conclude that there is NOTHING you can do in WebXR that you can't also do in the flat 2D web. There is no malicious hack for webxr that isn't also a vulnerability for the 2D web. The security for both webxr and the 2D web needs to be the same.

from reading various threads on webxr on github there seems be a small number of people who fear that there are whole new vulnerabilities opening up in webxr that do not exist in the traditional web such as the idea that someone could fake your web experience so that you are logging into your bank when you think you are playing a game. ... There is no malicious hack for webxr that isn't also a vulnerability for the 2D web

Did you read the explainer especially the part of Navigation & User Expectations ?

n5ro commented

"Spoofing: could imitate browser or OS/platform UI(...)
"Spam: Obnoxious immersive ads (...)
"Malicious actions: Users could be "rickrolled"(...)
"Well-intentioned but undesirable content:(...)

honestly I hadn't but now that I have this actually makes my point even stronger, anything you can do in webxr can also be done in the 2D web, any security vulnerabilities in one apply also to the other.

it might be 3D for human eyes but it's still 2D code, and any vulnerabilities that we have in webxr we also have in the 2D web and visa versa

@cabanier I understand your concerns. I feel there’s already pretty good alignment with the navigation document, API proposal and above discussion. I would love to hear what Google folks think.

I think the explainer speaks for itself. Your API proposal does not address their concerns.

Having an API keeps the conversation focused and helps moving towards shipping. Timing is good. There’s tons of experience we’ve gathered since WebVR was first drafted and implemented almost 5 years ago.

Unfortunately, things change when you move from a couple hundred thousand people to over 2 billion

An API actually distracts from the actual problem: how can we ensure that people know what site they are on?

n5ro commented

"When the user is navigated to a new page, they need an opportunity to evaluate the page, its URL, especially the origin,"
so control the controller/gaming pad inputs at the browser level, including hand tracking, to put an address bar, bookmark buttons, and other browser buttons on the virtual controller/ or virtual hands

it might be 3D for human eyes but it's still 2D code, and any vulnerabilities that we have in webxr we also have in the 2D web and visa versa

That is incorrect. When you visit a malicious site in 2D mode, it has no control over the url bar or anything outside of your browser window. With WebXR, it has complete control.

n5ro commented

The browser could detect if a plane is inbetween the user camera and the browser's address bar on the controller, and either warn the user or disable the experience. Problem solved. Think of it like, instead of the browser being outside the webpage its inside the webpage, and either way the browser can prevent itself from being overlayed and hidden. Also think of how not having browser navigation buttons accessible inside the webXR experience actually makes the webXR experience worse, if you want to bookmark a page, or view the console you have to exit the webxr experience currently. having a browser controlled controller in XR that you know is secure, that can't be overlayed or hidden by the webxr application is a great security feature and one I think users will prefer, also it would be nice to have distinctive skybox light whenever link traversal happens, as a signal for users to check their address bar, and it would be nice to have a full screen mode option so users could hide the browser controls if they wish ( and perhaps rely on skybox notifications for unexpected link traversal, or other security warnings such as unsecure websites, visiting known malicious domains, or sites without certificates). if you pick up something in WebXR, the browser controlled controller would have an always "ontop" feature that would show the address bar on the controller even through whatever you are picking up, making that part of that thing transparent, and perhaps creating a little warning tag that some entity is between the user and the address bar, its just an idea. these features that I am talking about would be in a sort of browser controlled layer, that sits on top of anything else written in webxr, so that webxr developers can not override them, if the sky was too bright for example, then the distinctive skybox signal for link traversial would be a shadow pattern instead of a light pattern. the user would feel confident that they could access their address location with a glance, and their security notifications bar at a glance, assured that nothing written by the webxr developer could come between them and their browser controls without disabling the experience or warning them, or without that entity becoming transparent. A user could also prefer to receive audio notifications instead of visual notifications, reporting address bar changes, and security notices

@n5ro in the future please don't spam the message threads it makes it difficult to reply and maintain constructive conversation, instead edit your previous message to include your new points, or even better draft your reply in advanced and post it in one go.

kfarr commented

(I am only a hobbyist webxr developer, usually using A-Frame, so you can discount my feedback as I'm not a framework or browser developer.)

My simple feedback is that the core of the web is all about links, and if I'm in VR I would expect to stay in VR if I "click" on a link. Consider this (imperfect) analogy of early text browsers in unix shells. Following links from page to page in the same "user session" was one of the very few features offered. What if instead of following links it had dumped you back on the shell and displayed the URL of the link you clicked on, so that following a link required the (more secure) step of reloading the page in a new browser session?

Yes, that would work, but instead some wise people took the route of letting the user agent decide how to preview a link in the browser and allow the user to traverse it in the same "user session." The rest seems like details that I trust the group on this thread can figure out, and in fact appears to already have figured out through research and production implementations.

Yes, that would work, but instead some wise people took the route of letting the user agent decide how to preview a link in the browser and allow the user to traverse it in the same "user session."

@kfarr please read the explainer why this is problematic

The rest seems like details that I trust the group on this thread can figure out, and in fact appears to already have figured out through research and production implementations.

Unfortunately, current implementations violate basic security.
We need actual research (with user trials,etc) which will be expensive and time consuming.

I’d like to hear from folks st Oculus, Microsoft, Samsung and Amazon about the proposal to move this discussion to a separate proposal Repo so we can continue this discussion in a more structured way.

Hey, folks. There are a lot of pieces to this discussion and it feels like we're talking past each other.

Let's all be extra careful with tone in future comments.

I'll talk with the leadership team about where to take this, be it a repo or a call or some other venue that will let us make more progress with less heat.

Sorry for being super late to the conversation. As was already mentioned, Oculus Browser supports vr-to-vr navigation in WebVR via displayactivate event and this capability is being used by many existing WebVR experiences (well, if you can apply "many" to several good-known WebVR experiences). We are super interested in adding vr-to-vr nav to WebXR (something tells me Amazon should be interested too :) ). So, yeah, let do a proposal for vr-to-vr navigation, sounds to me like the right thing to do.

This thread has grown very large, I have set up a repo to coninue this discussion:
https://github.com/immersive-web/navigation

There are many points raised in this thread which should be explored in issues in this repo.

Please keep discussion friendly and welcoming. This will likely be the first port-of-call for new people interested in navigation in immersive experiences.

If you want to add documents, explainers, or research to the repo please do. You can assign me (@AdaRoseCannon ) to review pull requests.

For those arriving to this issue. The conversation has moved to its own repo: immersive-web/navigation#2

Closing this issue until a concrete proposal is ready to discuss based on the work in the new repo.