decadenza/SimpleStereo

Clarify the meaning of K1 and K2 matrices

LightCannon opened this issue · 10 comments

Hello. Thanks for putting time on this awesome project.
On my previous issue (#5), you ended with modifying the code to have one fit matrix and this fixed the disparity problem (and I have verified this on my own code).

However, the dimensions (and the depths) I'm getting are bigger than they should. I revised the code for Q matrix, and found that you wrote in a comment that
fx and fy are assumed the same for left and right (after rectification, they should)
and
cy1 is equal cy2 (as images are rectified)

On comparing these terms in both K1 and K2, they seem quite different:
`
self.K1 =
array([[-6.98264792e+02, 7.34552202e+00, 1.00334127e+03],
[-4.67178484e-12, -7.41609493e+02, 7.94008922e+02],
[-6.28781668e-14, -1.32261075e-20, 1.05269262e+00]])

self.K2
array([[-6.83582854e+02, 4.87362552e+00, 9.40342696e+02],
[-5.20617274e-12, -7.23981614e+02, 7.75135521e+02],
[-6.13919877e-14, 9.75077436e-22, 1.02767037e+00]])

`
In addition, I see there are values for a1 and a2, which I don't understand their purpose, but their values are not small.

I'm not sure where the problem lies, but I think K1 and K2 have some problems, which I cannot see. May you take a look on that?

Regards.

The only wrong thing here is my comment "Final intrinsic matrices that keep track of all affine transformations applied" on this line which I am going to correct as "Final matrices that keep track of all the transformations applied". The code is right.
K1 and K2 and not the final intrinsic matrices but the final transformation matrices. As you can see on this line they comprise all the transformations required by rectification:

  1. Cancel the original camera matrix
  2. Apply common orientation (actual rectification)
  3. Apply common intrinsic matrix (Fit).

Values on position [0,1] come from both rotations.

The final intrinsic matrix which is equal for left and right is defined as Fit (see here).

EDIT: Sorry @LightCannon . I was wrong. I reviewed the code and yes, actually K1 and K2 are the final intrinsic matrices for left and right cameras. If you check this line, what is happening is that I take the final rectification transform and I cancel out the original intrinsic matrix and the common rotation contained in self.rectHomography1. What remains is any affine transformation applied after rectification. On top of that, the new affine transform contained in Fit is applied. So that K1 contains the final affine transform applied after rectification, which is why it is used in the calculation of the Q matrix.

Sorry if my question seems quite down, why we are not making the Q matrix out of "Fit" elements (since it is now the unified intrinsic matrix for both cameras)?

Because the Q matrix needs to take into account all the transformations applied to the images, not only the intrinsics contained in Fit. It needs to take into account the rotations applied, the cancellation of the old camera intrinsics and the new intrinsics. These are all contained in K1 and K2.

Please if you have a specific bug, share your code and reopen this issue.

Alternatively, I opened a Discussion section where general questions can be done. Cheers.

The problem I'm facing is that I'm not able to get any dimensions equal to real ones. So, I'm revising all concepts to know where is the problem. Currently, I'm using the mentioned calibration parameters and your code to get a rectified image. Then I get 2 matching points in both left and right (uL, vL and Disparity). Then I apply the Q matrix to get points (sure after getting them back from homogenous coordinates).

Now, as per I understand, After rectifications, we should have a unified intrinsic matrix for both cameras (which is similar to ones we get out when we use opencv stereorectify). Starting from this point, I have 2 canonical cameras with unified intrinsic matrix therefore I should be standing on the very simple case, which I can just use Q matrix made from the unified intrinsic matrix.

Now, what is wrong in what I said concept-wise?

In addition, I inspected the "Fit" matrix values
and got this:

`
array([[-6.55342729, 0. , 93.61818902],
[ 0. , 6.84035067, 73.21410023],
[ 0. , 0. , 1. ]])

`
For sure, this cannot be an intrinsic matrix since the numbers are very small (may be there is some scale?)

To conclude this discussion, the code is right. The final intrinsic matrix you are looking for is K1 for the left camera, and K2 for the right one. They are not the same, because in the rectification algorithms other affine transformations may be applied differently for left and right camera. As it happens for example in the algorithm by Charles Loop and Zhengyou Zhang in “Computing rectifying homographies for stereo vision” (1999), DOI: 10.1109/CVPR.1999.786928

On top of those affine transformations, we apply another common Fit affine transform to better visualise the two rectified images.

I hope that I clarified the matter. Thanks for asking.

May you point me to the derivation of Q matrix in this case? The derivations I find are ones for the normal discussed case of 2 cameras rectified and having same intrinsic matrix

What problem are you having with the code IN SimpleStereo? Sorry I don't get the question. If you are looking for a tutorial on how Q matrix is calculated you should refer to OpenCV documentation.

I would like to know the derivation of how Q matrix is calculated in SimpleStereo. I know the derivation of the Q matrix calculated in OpenCV (can be summarized in this answer: https://answers.opencv.org/question/187734/derivation-for-perspective-transformation-matrix-q/)

This Q matrix is different from the one used in SimpleStereo. Basically, and from what I understand, The Q matrix on OpenCV assumes we made rectification and both cameras now have same intrinsic matrix.
In SimpleStereo, this is not the case, as you said K1 is different from K2. So, how did u reach the Q matrix used in SimpleStereo?

Exactly. SimpleStereo approach is general as it allows the use of different intrinsic matrix from left to right.

The calculations I have done are very long and currently on paper. The basic steps are:

  1. Identify the left and right intrinsic matrices, respectively:
K1 = | fx   a1   cx1 |
        | 0    fy    cy1 |
        | 0    0     1    |

K1 = | fx   a2   cx2 |
        | 0    fy    cy2 |
        | 0    0     1    |

Note common fx and fy but different rest of parameters.

  1. As done in OpenCv (see here), impose the following equation:
    Q * [ x1 y1 d 1]^T = [X Y Z W]^T

where d = x1 - x2 is the disparity in the rectified images.

  1. Convert x1 and x2 to world coordinates and derive the expression of the vector [X, Y, Z, W]^T in homogeneous coordinates.

  2. Do considerations to find the expression of Q that gives the 3D point in world coordinate.

I suggest to read about homogeneous coordinates if you want to go deep. Good luck.