jacobnzw/SSMToybox

Check StudentInference

Opened this issue · 3 comments

There is some fishiness in StudentInference that I don't understand.
Fix this after #7.

  • Why are MTs in StudentInference taking scale matrix? They should be taking covariance, because they are moment transforms after all! No?
  • Why is DoF of the noise RVs used for construction of the FS unit SPs? (see TPQStudent)
  • Check StudentInference._measurement_update(): the measurement update seems to have the relationship between scale and covariance backwards.
  • Check that StudentInference correctly implements Student filter.

Subtle point about moment transforms

Before getting to the crux of this issue, we need to talk about some philosophical differences between moment transforms and quadrature rules.

The general idea behind moment transforms (MT) is to take in mean and covariance (of a random variable with any density) and return transformed mean and covariance. From this it follows that the design of MTs can't rely on any specific functional form of the input density! In other words, MTs are density-agnostic, because they only care about moments of the input random variable (RV). (They are called moment transforms after all!) In short: moments in, moments out, don't care about the density of input RV.

The unscented transform was presented, in the original publications, as a general method for "nonlinear transformation of means and covariances" and thus fits into the MT design philosophy just fine.

Then some smartasses came along and noticed that it's just a fully-symmetric quadrature rule for approximating integrals w.r.t. Gaussian weight function. And herein lies the subtle issue. The quadrature rules are designed for specific weight functions (densities). So that means, in re-casting the UT in the language of quadrature rules, we imposed restrictions (consciously or otherwise) on the density of the input RV. In other words, from the quadrature standpoint, the UT is an FS rule for approximating integrals w.r.t. Gaussian density (weight function).

The theory of quadrature therefore clashes with the MT design philosophy (which lays emphasis on sole dependence on the input moments). Thus, in adopting the language of quadrature, as helpful as it is, we need to be acutely aware of this philosophical incompatibility!

When a designer adopts the quadrature viewpoint to come up with novel MT, he's really designing density-dependent "moment" transforms. The scare quotes just point out that what is being designed, in that case, isn't really a moment transformation, but more like a (sufficient) statistic transformation.

A similar philosophical distinction shows up when designing a filter. The Kalman filter (KF) was originally designed without any requirements on the noise densities - it only cares about existence of the first two moments. KF can therefore be used with any noise distributions and it will still be MMSE estimator. To derive KF from the Bayesian viewpoint, one needs to assume that p(x_k, z_k | z_1:k-1) is jointly Gaussian and then proceed from there, ultimately arriving at identical equations.

Derivation of the Student filter, on the other hand, relies heavily on the Bayesian perspective (at least I don't know of any old-school derivations). It seems like it's a creature that easier to find in the Bayesian world. And thus it should be viewed as not really fitting into the local filtering world, in a sense that it doesn't propagate first two moments. But it can be made to return these.

Having explained all that we can now move one to the crux of this issue.


Why are moment transforms taking scale matrix? (in StudentInference._time_update())

Because, the fully-symmetric rule is designed for Student densities with unit scale matrix (not covariance matrix!), which has the consequence that the decoupling substitution involves square root of scale matrix! This is exactly what's happening in the SigmaPointTransform.apply(mean, cov, ...), where we supply scale matrix as the cov argument, which is OK as long as we use FS SPs. The output is transformed mean and covariance (not scale matrix!). So the procedure work like this: mean, scale matrix in, mean and covariance out. I know, not very intuitive, but that's how it is!

As a note aside, consider the consequences of failing to realize the above philosophical distinctions

  1. unit sigma-points depend on the DoF, which is a parameter specific to Student density
  2. FS unit SPs thus can't be used w/ arbitrary densities as originally intended by MTs

All of these points are OK as long as we don't call the designed procedure "moment" transform.

SSMToybox design discrepancies

StateSpaceInference sub-classes are specific to a particular density (i.e. GaussianInference, StudentInference). That is, sub-classes are designed with the Bayesian viewpoint, which imposes density on the joint p(x, z). But the same philosophy isn't followed for moment transforms!

For reasons of consistency, the same Bayesian philosophy should be reflected in the class hierarchy of the "moment" transforms (they should really be referred to as statistic transforms).

The class hierarchy might look like this:

  • GaussianTransform or GaussianStatisticTransform
    • GaussianSigmaPointTransform
  • StudentTransform or StudentStatisticTransform
    • StudentSigmaPointTransform

There might be a common base class StatisticTransform, but currently I don't see the need for it.

Why is DoF of the noise RVs used for construction of the FS unit SPs? (see TPQStudent)?

  • Uncorrelated jointly Student RVs are still dependent (follows from the functional form of the Student density as it can't be factored into marginals even with block-diagonal scale matrix),
  • DoF is invariant to transformation,

DoF of noises is simply taken as default DoF to assume the MT input density has. Heuristically, one might choose the smallest DoF.
(Dynamics MT: dyn_dof = min(dyn,init_rv.dof, dyn.noise_rv.dof), Measurement MT: obs_dof = min(dyn_dof, obs.noise_rv.dof))