Negative eigenvalues for square root of Hessian matrix
epicjung opened this issue · 6 comments
Hi, I have a follow-up question about the square-root of Hessian matrix from #114.
I found that the A matrix does not necessarily have to be a semi-positive definite matrix, which means that it can have negative eigenvalues as well. From the select function in calculating S, all the negative eigenvalues become zero.
I think this could give a wrong linearized_jacobians and thus cannot accurately change the residual in the marginalization factor (residual_new = old residual + linearized_jacobians * dx). I just want to how to manage this kind of situation. Many thanks in advance.
Shameless self-plug, but maybe this is interesting for you: We wrote a paper discussing the issue of the marginalization Hessian becoming indefinite over time, which if untreated causes estimator failure: https://arxiv.org/pdf/2109.02182.pdf
You can deal with this in different ways:
- Computing a Cholesky decomposition of the marginalization Hessian and setting negative eigenvalues to 0 to prevent errors from building up. We found this prevents estimator divergence, but it doesn't give best accuracy.
- Performing the marginalization in Jacobian (i.e. square root) form. This gives best accuracy, and prevents negative eigenvalues from occuring by design. It usually has slightly increased runtime for the marginalization (which generally is a small part of the overall estimator runtime cost).
Shameless self-plug, but maybe this is interesting for you: We wrote a paper discussing the issue of the marginalization Hessian becoming indefinite over time, which if untreated causes estimator failure: https://arxiv.org/pdf/2109.02182.pdf
You can deal with this in different ways:
- Computing a Cholesky decomposition of the marginalization Hessian and setting negative eigenvalues to 0 to prevent errors from building up. We found this prevents estimator divergence, but it doesn't give best accuracy.
- Performing the marginalization in Jacobian (i.e. square root) form. This gives best accuracy, and prevents negative eigenvalues from occuring by design. It usually has slightly increased runtime for the marginalization (which generally is a small part of the overall estimator runtime cost).
Thank you for your paper suggestion. If I read it correctly, your paper adopts the second method you suggested right? Instead of solving J^T J dx = -J^T r, you are trying to solve J dx = -r, right?
Yes, the second method is what the paper suggests, correct.
However, this is somewhat independent of how you solve the linear system during optimization. In fact, we still form the normal equations and solve J^T J dx = -J^T r
during the optimization (at least for the reduced camera system). Only when marginalizing, we take the residuals to marginalize, form J dx = -r
, and directly marginalize some of the variables in square root form using QR decomposition, to get a new marginalization prior on the remaining variables in the form of the reduced J
and r
.
Yes, the second method is what the paper suggests, correct.
However, this is somewhat independent of how you solve the linear system during optimization. In fact, we still form the normal equations and solve
J^T J dx = -J^T r
during the optimization (at least for the reduced camera system). Only when marginalizing, we take the residuals to marginalize, formJ dx = -r
, and directly marginalize some of the variables in square root form using QR decomposition, to get a new marginalization prior on the remaining variables in the form of the reducedJ
andr
.
I am examining the code at https://gitlab.com/VladyslavUsenko/basalt/-/tree/master. I am still not clear about the second part of your answer. So, do you use QR decomposition to obtain dx from J dx = -r
and use that for LM optimization? I don't see how it differs from VINS implementation besides you are not using ceres but LDLT for inverse computation. In VINS's marginalization factor, it also feeds linearized_jacobians (marg_sqrt_H in your code) and linearized residual (marg_sqrt_b in your code) to the optimization problem. Could you please elaborate on the differences? Cheers.
I am still not clear about the second part of your answer.
I'm not 100% sure what you are referring two with "the second part of [my] answer", but I guess it's about the marginalization with QR.
So, do you use QR decomposition to obtain dx from J dx = -r and use that for LM optimization?
No, this is the marginalization step, not the optimization step. We are not trying to determine dx
here. Instead, we are setting up J and r for the residuals that we want to marginalize, then we use the square root marginalization technique that we introduce in the paper (which is based on QR decomposition) to arrive at a new J, r pair for the remaining variables, which then constitutes the new marginalization prior for the next round of optimization.
I don't see how it differs from VINS implementation besides you are not using ceres but LDLT for inverse computation.
This is the optimization step. Indeed this is should be very similar. Not sure how Ceres is configured for VINS here, but probably it's also doing LDLT to solve the linear system for the optimziation in VINSMono.
In VINS's marginalization factor, it also feeds linearized_jacobians (marg_sqrt_H in your code) and linearized residual (marg_sqrt_b in your code) to the optimization problem.
Yes, that is again about optimization. The difference is in how marg_sqrt_H
and marg_sqrt_b
are computed. If I undestand what VINSMono does correctly (feel free to correct me if I'm wrong), then for marginalization they set up J and r for the residuals to marginalize, then they compute H and b, then they do the marginalization with the Schur complement, which gives a new H and b for the remaining variables, which they finally factorize to the a new J and r on the remaining variables. But the resulting J and r are not the same as you would get from our procedure. First of all, matrix square roots are not unique, so there are infinitely many J and r that constitute a square root of H and b after margainalization. Second, and
most important, what we show in the paper is that directly marginalizing in square root for is numerically more stable than going via the squared form H and b. This leads to better accuracy, in particular when computing in single precision, as our ablation study shows.
I am still not clear about the second part of your answer.
I'm not 100% sure what you are referring two with "the second part of [my] answer", but I guess it's about the marginalization with QR.
So, do you use QR decomposition to obtain dx from J dx = -r and use that for LM optimization?
No, this is the marginalization step, not the optimization step. We are not trying to determine
dx
here. Instead, we are setting up J and r for the residuals that we want to marginalize, then we use the square root marginalization technique that we introduce in the paper (which is based on QR decomposition) to arrive at a new J, r pair for the remaining variables, which then constitutes the new marginalization prior for the next round of optimization.I don't see how it differs from VINS implementation besides you are not using ceres but LDLT for inverse computation.
This is the optimization step. Indeed this is should be very similar. Not sure how Ceres is configured for VINS here, but probably it's also doing LDLT to solve the linear system for the optimziation in VINSMono.
In VINS's marginalization factor, it also feeds linearized_jacobians (marg_sqrt_H in your code) and linearized residual (marg_sqrt_b in your code) to the optimization problem.
Yes, that is again about optimization. The difference is in how
marg_sqrt_H
andmarg_sqrt_b
are computed. If I undestand what VINSMono does correctly (feel free to correct me if I'm wrong), then for marginalization they set up J and r for the residuals to marginalize, then they compute H and b, then they do the marginalization with the Schur complement, which gives a new H and b for the remaining variables, which they finally factorize to the a new J and r on the remaining variables. But the resulting J and r are not the same as you would get from our procedure. First of all, matrix square roots are not unique, so there are infinitely many J and r that constitute a square root of H and b after margainalization. Second, and most important, what we show in the paper is that directly marginalizing in square root for is numerically more stable than going via the squared form H and b. This leads to better accuracy, in particular when computing in single precision, as our ablation study shows.
Crystal clear! Thank you for your answer. You made my day :)