Calibration results visualization
Opened this issue · 28 comments
After some time struggling with the transforms problems, it is finally possible to compare the pixels of the image points with the reprojected points.
For now, this comparison is only possible for one collection at a time (selected in the command line).
The intrinsic parameters of the top_right_camera took from the Matlab stereo calibration are too different from what as expected... I still don't understand why that is happening. The transforms are good and the intrinsics from the other camera are good. Because of this, the points from the stereo calibration have a very large pixel offset.
There is a JSON file with the results from the stereo calibration from Matlab:
test/sensor_pose_json_v2/matlab.json
good work! Some comments:
fter some time struggling with the transforms problems, it is finally possible to compare the pixels of the image points with the reprojected points.
everyone does.
For now, this comparison is only possible for one collection at a time (selected in the command line).
It is ok for starters.
The intrinsic parameters of the top_right_camera took from the Matlab stereo calibration are too different from what as expected... I still don't understand why that is happening. The transforms are good and the intrinsics from the other camera are good. Because of this, the points from the stereo calibration have a very large pixel offset.
We should talk by phone. Are you available tomorrow? Can I call you? When?
Abraço,
Miguel
Hi @miguelriemoliveira,
Yes, I am available. You can call me at any time starting from 11 a.m.
Thank you for your help!
Hi @afonsocastro ,
after our phone talked I was searching. This could be helpful.
https://www.learnopencv.com/homography-examples-using-opencv-python-c/
Hi @miguelriemoliveira !
In accordance with what we've talked, Matlab stereo calibration was left apart.
Now, these are the results for the OpenCV homography finder versus our procedure.
This is already for all collections where the chessboard was detected for both of the cameras (16 collections). It gives a total of 768 points (48 points for each collection).
It seems that 85% of the points are between the 30 pixels tolerance, i don't have sure if this is right... Beside this, both of the results are very similar, altought the red dots aren't so scattered.
Waiting for some feedback
So you managed to get the opencv findhomography working. That's great!
The results are quite nice. It seems that our approach gives better results.
Some questions / suggestions:
- give different colors to each collection. This should help us find out if there is one or two collections in particular which do not give good results (we could remove them)
For this you will need to use colormaps. Here is an example:
-
You can put circles for our approach vs squares for the others, since color will be used to distinguish collections.
-
My only concern is when talking about 30 pixels. Usually, average reprojection error is between 0.3 and 2 pixels. 30 is too much, but from what I understand 30 is the maximum reprojection error. Can you compute the average for comparison?
Great work!
Hi @miguelriemoliveira,
Here are the results of errors by collection and by both of the verification procedures. You can also see the average error (for all 16 collections) in pixels by each axis:
AVERAGE ERROR (our calib):
x = 15.9574686686 ; y = 14.4347419739
AVERAGE ERROR (openCV):
x = 23.5568695068 ; y = 23.9388504028
As we can see, there are some collections that give bad results.... and the average error is far away from the 0.3 to 2 pix range that you talked about.
So, as you suggested, I took off the 6 worst collections. Obviously, the average error decreased, but it is still around 9 and 10 pixels. This isn't very encouraging! Take a look at the graph and the results:
AVERAGE ERROR (our calib):
x = 9.35389404297 pix ; y = 10.2167816162 pix
AVERAGE ERROR (openCV):
x = 11.2116719564 pix ; y = 15.2526662191 pix
Do you have any idea about what is happening?
Anyway, our calibration procedure is better than the homography finder function of OpenCV which is very nice! :D
First of all, results look very nice. Using the colormap really improves the quality of the graphics.
Some tips to improve further:
- legend: our approach -> proposed approach (that's how it will be on the paper)
- legend: "pixel error with our" should be removed, if needed this information should be in the title
- axes legend: y offset [pixels] -> y error (pixels) (also for x)
- Use a diferent colormap which does not have associated with it the idea of good or bad. In this one, it may appear as if the red ones are bad and the green ones are good, which is not true. Use something without red. I often use Pastel1 or Pastel2
https://matplotlib.org/users/colormaps.html
Now for the difficult part: Why is our absolute error so high?
The good news is that it should not be a problem with our approach since we have the same errors when using the opencv approach. So that leads me to consider the qualidty of the dataset.
Some ideas.:
- are we taking into account the distortion parameters?
- One of the cameras was not operating very well (very slow). Perhaps there is a de-synchronization effect which causes errors. Suppose one image is taken at time t, the other at time t+x, if the chessboard is moving since we assume all are from time t we get high reprojection errors.
When taking collections, where you careful to select moments in which the chessboard was not moving? - Can you try find homography with a stereo dataset from the internet?
- Can you try our approach with a stereo dataset from the internet?
- We should try to fix the frontal camera's low frame rate (We must take a new bag file)
We can speal by phone to try to determine a course of action.
Well... good and bad news: I found a good stereo dataset from the internet! In this dataset, there are 9 pairs of photos where the chessboard is detected by both of the cameras, so we have 9 "collections". This dataset also contains information about the intrinsic parameters of the cameras.
I create a specific script to develop the JSON file that is needed for the optimization procedure. Here's the calibration:
If you zoom the 3D graph, you can see that the cameras are side to side, because of the stereo dataset.
So, that was great news, because it allows us to make robust conclusions about where is the problem of the high error!
The bad news is that, with this internet dataset, the results of the OpenCV homography finder are very good but the results of our proposed approach aren't within the desired limits:
AVERAGE ERROR (our calib):
x = 8.28955906997 pix ; y = 2.48023365162 pix
AVERAGE ERROR (openCV):
x = 0.396058400472 pix ; y = 0.258330192095 pix
The triangles are so close to the graph origin that it is difficult to see them.
So, my thoughts:
1 - The result visualization has no problem (because of this good result for the OpenCV with the model dataset).
2 - Our dataset has low quality, we really should take a new one.
3 - I still didn't find where is the problem of our calibration procedure, but I think that with this dataset sample it will be easier to find out.
that's good news. We have a standard high quality dataset. Some comments:
AVERAGE ERROR (our calib):
x = 8.28955906997 pix ; y = 2.48023365162 pix
yep, something is going wrong. We should be close to opencv's numbers.
AVERAGE ERROR (openCV):
x = 0.396058400472 pix ; y = 0.258330192095 pix
Yes, these are the typical values.
The triangles are so close to the graph origin that it is difficult to see them.
That will change once our approach has smaller errors, so let's not worry about it.
So, my thoughts:
1 - The result visualization has no problem (because of this good result for the OpenCV with the model dataset).
Not entirely sure. How do you compute the projection of the pixels? You don't take distortion into account do you? I think in our optimization procedure we do.
That could be a difference between the optimization and the visualization ...
2 - Our dataset has low quality, we really should take a new one.
Definitely. But lets stick with the "standard dataset" until we figure out what's wrong.
3 - I still didn't find where is the problem of our calibration procedure, but I think that with this dataset sample it will be easier to find out.
What is the reported error during the optimization?. I think if you use only cameras it is in pixels and you can directly compare. Is it bellow 1? If so then I think you visualization has something wrong. If not them the optimization is not well parameterized. Check this line:
Great work!
Yes, in our optimization we take distortion into account. In the results visualization, I didn't compute it because I don't know (yet) how to relate distortion with the pixels reprojection.
I will think about it in order to get to some solution that makes sense.
Also, I don't know the influence of that scale factor... In this new dataset, all images have the same size so that shouldn't make any difference. I will study that as well.
For now, here are the results of the optimization procedure, only considering the cameras (for direct comparison pixels to pixels):
If ftol = 0.1:
Average error = 3.54325120363
ftol
termination condition is satisfied.
If ftol = 0.02:
Average error = 3.49996586648
ftol
termination condition is satisfied.
If ftol = 0.001:
Average error = 3.10085084103
ftol
termination condition is satisfied.
Actually, they seem quite better, but not so good as the homography finder of OpenCV. I'm gonna sleep now, but tomorrow I will study the influence of the other parameters.
Hi!
1- scale factor:
As I said, the scale factor should only be interesting when the two images have different sizes, which isn't the case in this new dataset. Here are the results of the errors without this scale factor:
AVERAGE ERROR (our approach):
x = 9.08917763792 pix ; y = 7.08498429663 pix
AVERAGE ERROR (openCV):
x = 1.09436138765 pix ; y = 0.618174376311 pix
They got worse for both methods, so I will leave the factor as a correction parameter.
2 - distortion:
After using the objective function with the projectwithoutdistorcion, the results of our approach actually improved (x error about 2.1 pix, y error about 0.2 pix):
AVERAGE ERROR (our calib):
x = 6.15383421345 pix ; y = 2.20037088276 pix
AVERAGE ERROR (openCV):
x = 0.396058400472 pix ; y = 0.258330192095 pix
But they're not what is expected... Do you think that the inclusion of distortion parameters on the analysis of the results could be what is missing? If so, this test shouldn't give us the right errors already?
3 - time running:
With ftool at 1e-3, the running time of the optimization is a bit more than 6 minutes. I think this is a lot.
Hi @afonsocastro ,
Sorry for the delayed response. I was finishing my vacation and decided to wait for the first day of work to think about this.
Good work. We are making very good progress!
1- scale factor:
As I said, the scale factor should only be interesting when the two images have different sizes, which isn't the case in this new dataset. Here are the results of the errors without this scale factor:
AVERAGE ERROR (our approach):
x = 9.08917763792 pix ; y = 7.08498429663 pixAVERAGE ERROR (openCV):
x = 1.09436138765 pix ; y = 0.618174376311 pixThey got worse for both methods, so I will leave the factor as a correction parameter.
OK, I don't understand this very well yet, we should discuss it in person.
2 - distortion:
After using the objective function with the projectwithoutdistorcion, the results of our approach actually improved (x error about 2.1 pix, y error about 0.2 pix):AVERAGE ERROR (our calib):
x = 6.15383421345 pix ; y = 2.20037088276 pixAVERAGE ERROR (openCV):
x = 0.396058400472 pix ; y = 0.258330192095 pix
Hum, average error is "x error about 2.1 pix, y error about 0.2 pix" but in your numbers above its x= 6.1 and y= 2.2? It should be the same no?
But they're not what is expected... Do you think that the inclusion of distortion parameters on the analysis of the results could be what is missing? If so, this test shouldn't give us the right errors already?
Not sure, lets discuss.
3 - time running:
With ftool at 1e-3, the running time of the optimization is a bit more than 6 minutes. I think this is a lot.
Yes, this should be enough to get to a very accurate result.
I suggest the following simple test.
Change the optimization code to project some fixed 3D point, and pass it trhough the pipeline to see to which pixel coordinates the point is transformed (taking the camera pose and intrinsics into consideration).
Next, do the same using the same 3D point and same camera pose and intrinsics in your evaluation code. The xpix ypix values for the projection should be the same (to the 8th or 9th decimal place).
I suspect these values are different, and that will explain the "bad" results we are getting.
We should meet this week. Is tomorrow or Wednesday ok for you?
Miguel
Hi @miguelriemoliveira,
I hope you had a great vacation, thanks for the continued help and feedback!
About our meeting, yes. Tomorrow morning is ok for me, and Wednesday as well. Could it be tomorrow, at 10 a.m.?
For tomorrow discussion,
If the results evaluation function works with the projected pixels (and not with the ground truth pixels) the OpenCV homography finder has a bigger error than our evaluation:
AVERAGE ERROR (our calib):
x = 7.49847713518 pix ; y = 1.81251488203 pix
AVERAGE ERROR (openCV):
x = 11.7356160482 pix ; y = 6.17495087047 pix
Hi! Good but bad news:
In comparison with the OpenCV function (calibrate camera, in order to get the sensor-chessboard transform needed for our reprojection error procedure), our optimization has better results!
These are the results after calibrating the sensors pose with 9 collections:
AVERAGE ERROR (our optimization):
x = 8.23603048442 pix ; y = 1.97276852455 pix
AVERAGE ERROR (openCV calibrate camera):
x = 22.6928228684 pix ; y = 3.23359887394 pix
The bad news is that our code has some bugs. The results, with only one collection, show that the pixel error has got bigger in comparison to the 9-collection study. OpenCV calibrate camera function it actually got better, as it was expected:
AVERAGE ERROR (our optimization):
x = 80.6875678168 pix ; y = 34.6731363932 pix
AVERAGE ERROR (openCV calibrate camera):
x = 6.31464979384 pix ; y = 1.34845966763 pix
I'm going to think about this, do you have any idea? Maybe some test to accurate where is the problem?
Hi @miguelriemoliveira ,
yes I am available. Wednesday, at 2 pm?
Afonso
Hi,
after our meeting, I've implemented our conclusions in order to get also the comparison with the stereo calibration results of openCV function. For now, these are the results:
AVERAGE ERROR (our optimization):
x = 8.83484188127 pix ; y = 2.65536649728 pix
AVERAGE ERROR (openCV stereo calibration):
x = 4.5144050504 pix ; y = 0.95115454403 pix
AVERAGE ERROR (openCV calibrate camera):
x = 27.3388310185 pix ; y = 29.2516185619 pix
I remember that this is, as we know, a bad dataset because this pattern has rectangles instead of squares and we dont know the size of the rectangles.
I will try tonight test it with our new dataset to see the results.
I am really excited to see the results in a "good dataset" ...
... and this is already uniform in terms of comparison?
Hi @miguelriemoliveira!
After solving some founded problems, here are the results of the good dataset... They look very nice!! 👍
First of all, the first part of our approach (creating sensors pose first guess, labeling data and collecting snapshots) was specifically working only for 8x6 chessboards (the old chessboard). Now, it requires, as an input argument, the number of squares to create the original JSON file. So, now the code is more robust! (readme updated).
Here are the results:
AVERAGE ERROR (our optimization):
x = 0.148268815354 pix ; y = 0.188933897445 pix
AVERAGE ERROR (openCV stereo calibration):
x = 0.161108901218 pix ; y = 0.221039052211 pix
AVERAGE ERROR (openCV calibrate camera):
x = 0.180819144803 pix ; y = 0.216612541813 pix
I am very happy with these results because they all are within the expected ranges. More than that, our approach could reach to a better sensor configuration than the OpenCV tools!
It's important to remember that all of this is only for cameras and that now the square size was the real one (I think this fact is the major responsible for the difference in results, in comparison to the Internet dataset).
Some notes about this test:
1 - Time running:
Our optimization ---> ~ 40 minutes
openCV calibrate camera ---> 1 minute (maybe less)
openCV stereo calibration ---> few seconds (quickest)
2 - All chessboard corners were taken into account (9x6=54).
29 collections were studied.
Total studied points (for each procedure):
1566
3 - Our optimization worked with the distortion parameters, as the OpenCV tools. Results visualization did not (as always).
The comparison of the results is uniform: sensor 1 to chessboard transform was found using solvePnP (with the intrinsic parameters computed by each procedure, respectively). Sensor 2 to chessboard transform was taken directly from the final JSON file of each calibration (for stereo, it required the combination with the sensor1-chessboard tf, found by solvePnP).
As we can see in the results graph, collection 11 or 8 or 9 (not sure which) was the worst, by far, for all approaches. I can run all over again without this collection to see if the results got better.
Or even run our opt only with the 4 chessboard corners, to test that hypothesis that we had talked about.
Hi!
1- I run out the optimization without the visual graphics part and it toked less than a minute to finishing it. All 40 minutes mentioned before were only because of that, once this test was made with exactly the same studied points number.
2 - I tried our calibration and openCV calibrations without collection 11 to see if the results would get better and, for my surprise, the optimization results got a bit worse... stereo and camera calibration actually improve their errors:
Total studied points (for each procedure):
1512
AVERAGE ERROR (our optimization):
x = 0.14696353327 pix ; y = 0.238686718007 pix
AVERAGE ERROR (openCV stereo calibration):
x = 0.13414824955 pix ; y = 0.132025965938 pix
AVERAGE ERROR (openCV calibrate camera):
x = 0.112190791539 pix ; y = 0.13600612822 pix
3 - Test of optimization using only the chessboard four corners as residuals (also without collection 11):
AVERAGE ERROR (our optimization):
x = 0.398715105006 pix ; y = 0.290294707768 pix
AVERAGE ERROR (openCV stereo calibration):
x = 0.13414824955 pix ; y = 0.132025965938 pix
AVERAGE ERROR (openCV calibrate camera):
x = 0.112190791539 pix ; y = 0.13600612822 pix
As we can see, there was an error increase in optimization. This average error is computed using all the chessboard corners, which makes me conclude that the final sensor pose is not so accurate as of all-corners calibration. Time elapsed was similar to the previous experiment, so the difference is only about the graphics