Handheld AR use cases need more than immersive-ar
tangobravo opened this issue · 2 comments
Hi all!
(I posted an introduction to me and my background on the public WebXR mailing list here)
It would be very useful for many handheld AR use cases to have access to native ARCore / ARKit functionality outside of an immersive-ar
session. I'm planning on making the general argument in this issue, and then I'll separately post a couple of specific suggestions for how this could be exposed via WebXR.
immersive-ar
is clearly necessary for AR headsets; the WebXR API is perfect there to expose the multi-view metadata and to optimise for latency (ie allowing for late warping) by deferring to the XR compositor for presentation of the content.
I also think a full-screen view is a perfectly reasonable choice to provide immersive-ar
sessions on handheld devices offering native AR tracking capabilities. It's a great solution for anyone looking to make their headset-first content available to a much wider audience or wanting an approach that will be as device-agnostic as possible.
However I don't think immersive-ar
should be the only way to access native tracking in mobile browsers. For many real-world handheld AR use cases it would be valuable to expose this capability (with appropriate privacy protections of course) whilst allowing the site to maintain full control of how the camera view and frame-synchronized content is presented on screen.
Many issues that WebXR solves for headsets are not required for handheld AR - there's only one view to render, and latency is not as big a deal (lower is better of course, but there's no late-warping, and an extra frame for the browser page compositor to run is likely to be acceptable).
Here's a few reasons why I think immersive-ar
alone isn't a perfect fit for handheld AR use-cases:
immersive-ar
session gives more "layout power" on headsets, and less on mobile
This is for me the fundamental difference. Headsets render standard webpages into an emulated 2D plane somewhere in the user's environment. On these devices, an immersive-ar
session allows content to break out of this emulated 2D box and have complete freedom to show content all around the user in 3D space. It's a massive increase in freedom for how things are laid out. The downside of course is standard CSS layout doesn't really make sense in "free space" so the content all needs to be rendered via the XRSession - when your site is in an immersive-ar
session then everything shown is outside of the DOM.
For handheld AR, the "layout power" is reversed; immersive-ar
content on mobile is still in the end just shown on a 2D screen and displays 3D content from a particular viewpoint. Before entering a session the DOM is shown on screen, and 3D content from a particular viewpoint can be shown in a webgl canvas laid out in the DOM under the full control of the site. Entering the immersive-ar
session involves giving up control of how that content is presented on screen relative to other elements of the DOM.
In short - on headsets immersive-ar
allows your content to break out of a 2D box into the 3D world, but on mobile it actually locks your webgl content into a particular 2D box relative to the screen that wasn't the case before entering the session.
Justification for "mode switch" is less clear
On mobile it's hard to justify why entering into a WebXR session for handheld AR needs to involved a "mode switch" in the layout and the hiding of the DOM. On headsets of course this is required.
There's already a Full-Screen API on mobile
The web already includes a Full-Screen API - any canvas can go full-screen when the site wants with an appropriate user gesture. As a user I found it suprising on mobile that an "Enter AR" button resulted in full-screen presentation.
Mobile aspect ratios have become more extreme
Most modern devices from the last few years use notches or punch-hole cameras and have really small bezels on all sides. When system navigation and status bars are hidden, the aspect ratio is pretty extreme (my Pixel 4A has a 2.16 aspect ratio, significantly wider than 16:9 video (1.77)). The crop-to-fill used for full-screen immersive-ar
restricts the FOV of the visible portion of the camera feed quite significantly.
Notches/punch-holes are annoying
My Pixel 4A has a punch-hole front camera in the top-left. In the WebXR DOM Overlay samples, that means a portion of the overlay (the start of the title) isn't visible.
It's a non-goal of the DOM overlay extension to allow low-level control of the placement of the overlay rectangle to attempt to keep things agnostic between headsets and handheld. It's a noble goal and one I definitely support for immersive-ar
sessions, but for mobile-first experiences it's a limitation.
Polyfilling is complicated
iOS devices currently support neither WebXR nor the Full-screen API on phones, so the best a polyfill could do would be a WebGL canvas that fills the space between status bar and URL bar.
Combined with the extreme aspect of some android devices and the WebXR crop-to-fill fullscreen treatment, that means that even with a polyfill, cross-platform content would need to be authored to work with a very wide range of aspect ratios and FOVs.
Handheld AR doesn't need to be full screen
Take SketchFab where by default 3D content is shown in a canvas on a page, with other content around it - title, description, comments etc. There's no reason to me why an "AR" view shouldn't also be possible in the smaller canvas setup. There's of course also a "full-screen" button but to me that feels orthogonal to the AR feature.
Handheld-specific UX is a valid use case
Some handheld AR use cases and UX designs depend on that form-factor. For example, an experience that guides the user to take a selfie with the front-facing camera, and then flips to the rear camera to place an auto-generated avatar in a world-tracked experience. It's clearly not an experience that is applicable to headsets, but a perfectly reasonable flow for a site designed for mobile users.
As the vast majority of current AR-capable hardware is handheld it's reasonable for sites to specifically design for that form factor.
Rendering at screen refresh rate may be more appropriate
ARCore usually runs at 30 FPS, and immersive-ar
sessions (on Chrome at least) use that for their update rate. If the content contains fast motion and the user motion is expected to be more minimal, it might be preferable to update the content rendering at 60 FPS even if the camera frame (and other XRFrame pose data) only updates at 30 FPS.
"WebAR" exists and is pretty capable
Mobile browsers have almost universal support for Device Motion, getUserMedia, and WebGL. That combination alone allows simple 3-DoF orientation-tracked content overlaid on a live camera feed, rendered into an arbitrary webgl canvas anywhere in the DOM.
Combining that camera feed with computer vision code running in the page (typically compiled from C++ to WebAssembly) enables various 6-DoF tracking approaches. There are multiple commercial and open-source options providing "WebAR" tracking libraries.
WebXR for handheld AR does not exist in isolation, and right now purely client-side WebAR implementations tick more of the boxes for commercial handheld AR uses (in short - full layout control & better cross-platform support). If WebXR was easier to leverage as a progressive enhancement for these use cases then I would expect it to see much wider adoption.
I've posted two potential ideas to address these shortcomings of WebXR for handheld AR.
In my view it would be great to have both. I suspect inline-ar
will find more natural support, but as it's likely a more complex implementation job I wonder if it's worth the effort for the benefits it brings over the lower-level #78 proposal, which would immediately make WebXR usable as a progressive enhancement in all of our tools, libraries, and commercial projects.
I'm looking forward to hearing people's thoughts on these proposals.
/facetoface I'd appreciate some time to discuss these thoughts at the f2f. I'll be joining remotely from London, so morning SF time is best for me.