immersive-web/model-element

Exit interaction event (with data?)

marcoscaceres opened this issue · 11 comments

When exiting an AR experience it's sometimes useful to pass data from the model out back to the page. On iOS [1], for instance, a "message" event is sent:

Screen Shot 2022-08-08 at 2 29 52 pm

Which then allows a web page to over and perform some action through the web page. In the case above, it triggers Apple Pay through (presumedly) the Payment Request API.

Obviously, the "message" with the custom .data "_apple_ar_quicklook_button_tapped..." is not something we would want to standardize, but it might be good to consider some kind of user activation action resulting form the format itself causing the scene to exit with some action. The .data could be an IDL object (or something better) that could be used to handle the action (e.g., buy a thing).

[1] https://developer.apple.com/videos/play/wwdc2020/1

This implies that the user agent won't handle the ar session...?

Potentially, yes. Like, in Safari, viewing a USDZ file is handed off to AR Quick Look.

The explainer said this:

Consider a browser or web-view being displayed in Augmented Reality. The developer wants to show a 3D model in the page. In order for the model to look accurate, it must be rendered from the viewpoint of the user—otherwise it is a flat rendering of a three-dimensional image with incorrect perspective.

A solution to this would be to allow the web page, in particular the WebGL showing the 3D model, to render from the perspective of the user. This would involve granting too much private information to the page, possibly including the camera feed, some scene understanding, and very accurate position data on the user. It should not be a requirement that every web page has to request permission to show a 3D model in this manner. The user should not have to provide access to this sensitive information to have the experience.

Furthermore, there is more information needed to produce a realistic rendering, such as the ability to cast and receive shadows and reflections from other content in the scene, that might not be on the same web page.

This means that rendering in Augmented Reality is currently limited to a system component, different from the in-page component, leading to inconsistent results.

Maybe I misunderstood but it sounded like one of the main reasons we needed this new element was to stop using AR Quick Look?

I do think enter + exit events are important but this issue does bring up a good point re eventing.

Today a lot of insights are derived from the playback time of a <video>. Ideally for 3D the equivalent would be knowing the viewing angle + distance to a specific object. This would be really valuable information at least in a Commerce context and would allow merchants to learn which parts of the model Customers are interested in. This is not possible today with AR Quicklook.

Of course there's the proposed getCamera method that could help but will this method be available in an AR scenario when using the full screen options which are quickly discussed in the explainer?

Likely this idea should be tracked in a separate issue if it's not covered elsewhere.

This would be really valuable information at least in a Commerce context and would allow merchants to learn which parts of the model Customers are interested in.

No, unfortunately. If I'm understanding what you are saying, that would go against the privacy principle that model shouldn't ever reveal where the user is looking.

This is not possible today with AR Quicklook.

This is by design, I believe.

I'd love to understand these privacy principles more. Is there any documentation?

What I'm describing is equivalent to tumbling a 3D Model and registering where the camera is in relation to the model at any given time.

In my comment I am not talking about details about surfaces/location etc. But simply the orientation of the camera in relation to the model. I'm unsure there is a privacy concern there but it could be I've overlooked something.

@mrdoob wrote:

Maybe I misunderstood but it sounded like one of the main reasons we needed this new element was to stop using AR Quick Look?

Sorta (but it depends what you mean by "using AR Quick Look"): something still needs to be the renderer, right? how it's then all composited is left to the user agent. So, I'm not sure it matters if it's AR Quick Look or not, so long as the end result is what is expected by the user (shadows are cast in the right places, lighting is as expected, etc. etc.).

@mikkoh wrote:

I'd love to understand these privacy principles more. Is there any documentation?

Maybe:
https://immersive-web.github.io/webxr/privacy-security-explainer.html

But generally speaking, as today browsers don't randomly turn a webcam and track where a user is looking on a page, neither would an AR enabled environment report where a user gaze is pointing to (unless the user has given express permission).

In my comment I am not talking about details about surfaces/location etc. But simply the orientation of the camera in relation to the model. I'm unsure there is a privacy concern there but it could be I've overlooked something.

The concept of a "camera" is actually a bit whacky and doesn't make sense when you consider this in AR (see #11 and why it breaks in #48).

The thing to understand is that in AR, I could rotate an object with my hand (effectively changing the camera), but I could be standing to the right-side of the object, looking at it from above:

looking down at the object

The "camera" position is actually the rotation of the object, but it tells you basically nothing about what point or detail I, as a user, am actually interested in: In AR, I could be as far or as close to the object as I want. Or I could even be crouching underneath and looking at the bottom of the object, but the page wouldn't have any idea based on the camera's position.

looking up at the object

The implications being that getting the "camera" position is not super useful... but it's useful for developers to be able to orient the "camera" to show off various parts of a object that might be interesting. For example, there could be a button that rotates the object so the underside is facing forward (relative to the document, not to the user). Pressing that button is a strong indicator that users might be interested to look at the underside (which makes sense)... but not necessarily that they are looking at that, or for how long, etc.

Was there a discussion that Quicklook was going to be part of the model proposal?
It's certainly a nice workflow but I don't know how flexible was can make it so an author could customize it.

DR-xR commented

There may need to be a lot more interaction between the 3D content and the rest of the DOM page. A configurator would allow the user to swap parts, items, colors, etc. with an existing model. When the user reaches the desired condition the entire set of selections needs to be exported (or tracked as DOM page runs) so an order can be created.

Preventing this kind of interaction will limit the usefulness of and will cause people to create a lot of customized solutions to the problem. This kind of capability may be beyond the initial scope of but definition work now needs to be done in such a manner not to preclude it later on.

For the first iteration, we should concentrate on a minimum viable product.
Once that is implemented and shipping, it can be extended. We will also have a better idea on what is important.