immersive-web/webxr

Remove velocity and acceleration from VRPose, reduce to a matrix on VRFrameData

Closed this issue ยท 12 comments

toji commented

In the spirit of simplification I'd like to propose effectively dropping the VRPose interface and reducing the head pose to a single matrix on VRFrameData.

Overview:
Multiple VR APIs don't provide linear/angularVelocity and linear/angularAcceleration, or only provide a subset of those values. This makes them very difficult to rely on, and in most cases for something like velocity apps may be better served by computing things like velocity from the pose delta of one frame to the next.

Dropping those values would serve to provide several nice simplifications:

  • One less interface to worry about
  • No more nullable values, which has proven tricky for us to reason about
  • Reduces the number of mathematical primitives needed to 1: a 4x4 matrix.
  • We want to provide the pose as a matrix anyway, so this eliminates data duplication

I personally haven't seen apps using the velocity or acceleration values (which were only introduced to the API because they were in the Oculus API when this first started), and I'd rather err on the side of letting the dev community tell us they need them and adding them back into a future revision than codify something with an unclear use case forever more.

I also do think that velocity and acceleration may be more valuable for controllers, but I'd prefer to let the effort around speccing that out determine it's usefulness within that context separately.

Proposed IDL:

interface VRFrameData {
  readonly attribute Float32Array leftProjectionMatrix;
  readonly attribute Float32Array leftViewMatrix;

  readonly attribute Float32Array rightProjectionMatrix;
  readonly attribute Float32Array rightViewMatrix;

  readonly attribute Float32Array poseMatrix;
};

I don't have an immediate use case for the HMD right now, though I could probably imagine one. It's definitely useful for the controllers, and I would be upset if they were removed from those poses.

I should note that I've tried calculating these values from frame deltas and have found it very difficult to get accurate/consistent results. Maybe someone knows a better way.

While modifying this, might be worth at least thinking about what direction the "poseMatrix" transforms things. Should it be analogous to a view matrix for consistency, or be the inverse of that to make certain other operations easier (e.g. extracting position of the pose)? I think you probably intended the latter and I don't disagree, but worth at least pausing to make sure we don't mind the minor inconsistency here (viewMatrix transforms to view space, projection matrix to projection space, but poseMatrix from pose space?).

I am good with this proposal, with the caveat that we address @ssylvan 's concerns about the transform direction and reflect that in the final name of the matrix.

toji commented

The point about the naming is well taken. What do you think would be an appropriate way to disambiguate it? poseModelMatrix?

Hi. Whereas linear/angularVelocity and linear/angularAcceleration are actually provided from HMD gyroscope and accelerometer sensors, instead of making these data a part of WebVR it could
be worthwhile to consider relying on Sensor APIs, such as Gyroscope (https://w3c.github.io/gyroscope) and Accelerometer (https://w3c.github.io/accelerometer/).
The client could obtain a bunch of HMD motion sensor objects and thus calculate the HMD current position and trajectory.

Possible IDLs:

partial interface VRDisplay {
Accelerometer? accelerometer;
Gyroscope? gyroscope;
};

or alternatively:

let gyro = new Gyroscope({ "VRdisplay": displayId });

wdyt?

@ssylvan Has a great trick for matrix naming conventions that he's written up. The rough idea is that you represent the multiplication order in your naming convention. For example in a left-to-right (using row vectors) system, you go with AtoD = AtoB * BtoC * CtoD.
I'll see if I can find the whole write up.

I believe this is a pretty common convention, but it really should really be part of Graphics Programming 101 because it's such a simple trick that more or less eliminates buggy matrix math. For OpenGL convention (right-to-left multiplication), you would name all your matrices "fooFromBar", indicating that it's a matrix that transforms into the foo space, from the bar space. Then, a matrix multiplication chain looks like this:

projectionFromModel = projectionFromView * viewFromWorld * worldFromModel

Note that each multiplication sign has the same word on either side, which means that the matrices are compatible. Also note that the name for the combined matrix is just the two words on either side of the multiplication chain. Inverting a matrix swaps the words on each side of the variable: modelFromProjection = inverse(projectionFromModel).

Transforming between spaces is now no harder than slotting legos together - make sure the names match up on each side of the multiply operations, and the final result will make sense.

This convention is "associative", so you can pull out arbitrary groups into named variables. As long as you follow the naming convention still things work:

viewFromModel = viewFromWorld * worldFromModel
projectionFromModel = projectionFromView * viewFromModel 

Note that if you make a mistake, they're easy to find because the words on either side of a multiplication sign no longer match:

// Here someone accidentally multiplied in the worldFromModel matrix, even though it's 
// already baked into to viewFromModel. 
projectionFromModel = projectionFromView * viewFromModel * worldFromModel

That last multiply has a Model on one side and a world on the other. Once you get used to this these kinds of things stick out like a sore thumb.

Anyway, this isn't exactly the same as OpenGL conventions (e.g. model matrix would become worldFromModel), but it does make it crystal clear what each matrix is, so perhaps worth considering using something like this to help make the naming clearer. Especially since many apps will have to deal with a lot of different spaces eventually (spatial controllers, anchors, stages, etc.)

Whatever we do here, we should make sure the docs at least spell out what spaces the matrix transforms between (yes, "view matrix" has a known meaning, but might as well spell it out).

EDIT: Oh, I will say that "poseModelMatrix" at least decoded to the right thing in my brain, so that would also work. It doesn't have the same nice systematic property as what I described above (which also generalizes to things like anchors and stages), but if you know OpenGL convention (e.g. what a model matrix is), then "poseModelMatrix" is probably easy enough to understand.

toji commented

Good to know that poseModelMatrix is an improvement in parseability at least. I don't object to the AfromB convention you mentioned either, I'm having a little bit of trouble mentally mapping it to the matrices we want, though:

interface VRFrameData {
  readonly attribute Float32Array leftProjectionFromViewMatrix;
  readonly attribute Float32Array leftViewFromWorldMatrix;

  readonly attribute Float32Array rightProjectionFromViewMatrix;
  readonly attribute Float32Array rightViewFromWorldMatrix;

  readonly attribute Float32Array poseModelFromWorldMatrix;
};

That's very... verbose.

Verbosity is valid a concern. Relying on convention to use "well known" names reduces verbosity at the expense of making the meaning less explicit from just reading the names I.e. you have to know what "view matrix" means, or look it up in separate spec language somewhere.

I would call out that some of these names might have to change anyway when/if we introduce coordinate frames (in that case, the conventional names may not be close enough to what we actually mean to be appropriate anymore).

FYI: the recently added AbsoluteOrientationSensor (the populateMatrix() method in particular) would provide the same data as poseModelFromWorldMatrix proposed at #185 (comment)

FWIW, I was intending to add angular velocity readings for the HMD for Chrome on Android. One primary motivation for this is that we're using this metric for a hardware latency tester, and the original approach of estimating angular velocity from two successive poses in Javascript was working so badly that it's basically unusable for this purpose. One reason was that the headset poses were predicted values which made them unsuitable for measuring latency. I think it would be useful to distinguish raw-ish sensor readings from possibly smoothed / predicted pose estimates.

I do think it's valuable to provide these sensor readings in some way, and I would strongly prefer not to have them removed from the spec until there's a working new way to get this data. Other than that, I don't have a strong preference where it lives.

Also, the reference frame for reporting these values seems to be inconsistent across headsets, see #212 . If we do have them, the spec should be clear about the semantics.

toji commented

The primary goal of this issue was addressed in WebXR, which does expose poses as a matrix. If velocity and such have clearly identified needs in the future they can be added in > v1.