Repo to produce stereo depth map, do mapping and navigation
- Stereo Calibration tab
- Stereo Calibration tips
- Block matching tab
- Block matching tips
- Point cloud visualization
- Calibration and block matching links
- Feature detection tab
- Feature detection links
- Motion estimation tab
- Motion estimation links
- Mapping
- Mapping links
This repository is meant to be a proof of concept for stereo vSLAM. It is not an optimal solution. Contains calibration, depth map generation, feature matching, motion estimation and mapping. I was able to reach a basic efficienacy, showing the trajectory of the camera and drawing a point cloud environment resembling somewhat the room it was moving in.
My fixture was a home made stereo camera consisting of two Logitec C270 webcameras distanced 75 mm apart from each other.
This solution does not cover the usage of lidar, or IMU sensors. It solely relies on a stereo camera.
The app was tested on Ubuntu 20.04 with Python 3.8. Can't guarantie it works on windows.
To install dependencies run
pip install -r requirements.txt
sudo apt install python3-pyqt5.qtopengl
To run the app run python main.py
Think about this app as a pipeline where each tab represents a step towards a complete vSLAM solution.
- Start with the calibration tab, setup your camera matrices.
- Once done, jump on the second tab start block matching and depth map generation. tweak your parameters as you like.
- Then in the third tab have a feel of different feature matching algorithms.
- The next tab shows cameras' trajectory drawn as you move it.
- Last tab will show the mapping of the environment and the trajectory
- Save settings - saves all UI control values into a settings npz file. Also saves feature specific info and files to locations those name is identical to the settings name. Saving the settings will generate a lastSaved folder/folder as well. this is useful for Default load on startup
- Load settings - loads saved UI control values and additional data/files
- Default load on startup - when the applciation starts it will immediately load the latest saved settings.
- Swap lenses - each tab has this feature. It allows the user to swap left and right cameras in case the app recognized them in the wrong order
- Select camera indices - each tab has this feature. It allows the user to custom select a camera device. It will exit the app if left and right indices are the same. Useful, when there are more than two cameras connected to the system.
It allows the user to generate chessboard calibration images and create the calibrate the stereo camera for later depth map streaming. For this project I bought two Logitec C270 webcameras and glued them in a solidish cardboard box.
"Start" - when the application is started no use mode is setup. To start video streaming in calibration mode, press 'Start'
"Take image" - to generate calibration images, the need to be capture. Press 'n' or 'Take image' to store an image showing the chessboard pattern
"Process" - when enough chessboard images are taken, by pressing 'Process' it will start calibrating the cameras and create the final calibration data
Taking teh calibration images can happen in two ways
"Simple mode" - in this case, the user just presses 'Take image'/'n' until enough images are created.
"Advanced mode" - in this case, whenever a chessboard image is capture, the code will run stereo calibration on the existing set of images and will generate the rms. This basically shows, how well the 3D points are matching the 2D points during the projection. If it is over a threshold the image pair wil be thrown away.
As more images are taken, the rms, may increases and the fix RMS threshold makes it impossible to take new images, for this reason I added an increment of threshold every time, when an image is taken successfully. In order to stop getting the RMS out of control, there is a max limit the RMS can be increased to.
- Parameters:
- calib_image_index - index of the last taken image. It is useful, when the calibration image generation is interrupted (the app is closed) and the user wants to continue where he left.
- rms_limit - represents the current RMS limit, if the calibration rms is larger than this, when a new image pair is taken, the pair is discarded
- increment - represents the step of RMS incerement after each successfull image pairs
- max_rms - represents the maximum allowed RMS
"Ignore existing image data" - ignores the existing chessboard.npz information when trying to calibrate the sensors. Note: the existing calibration images are not deleted or ignored, only the generated chessboard cornes data, image, object points.
Performance can be quite different with different resolution, hence I added the feature of changing it. Note: Blok Matching resolution MUST be the same as the calibration for appropriate results.
When settigns are saved, also the calibration images are stored in calibImages folder in the appropriate left/right sensor folder under the folder name identical to the settings name.
First thing first, I am NOT an image/video processing specialist, there are more advanced people to tell you the right answer, see Sources for camera calibration and depth map generation. This project is just a tiny Minimum Viable Product for a larger project I try to learn more about.
My tips are based on personal experience.
- If you have a DIY stereo camera, make sure, they are aligned and focused as much as possible. During my first attempt, one camera had a different pitch and I was very surprised when the depth map was a nonsense.
- If you choose the baseline distance (distance between the lenses) too small, it will not have a too good depth detection (To large isn't good either). I had first some 60 mm, which seemed to be a bit small, so I cahnged it to 75 in the final design.
- It is very important to have a very accurate chessboard pattern. I think it cannot be stressed enough. I had many failed attempts, because there were slight bumps in my paper made pattern stick on the wall. Probably a solid material pattern is preferable.
- Do not use square shaped pattern, column and row count shall not be equal
- Probably less of a concern, but have a fix stand of your camera when taking pictures, shakyness can be problematic, especially if your cameras are not synced, like mine.
- Make sure the left and right cameras are not swapped, use the UI to swap them if they are.
- According to the wise people on the internet RMS value below 0.5 is acceptable. I tried to set it between 0.1-0.3
- In Advanced mode I usually set 0.13-0.15 start RMS limit, with 0.005 increment and max RMS of 0.27-0.3
- Take many pictures, from different angles and different distances. I have got low RMS images sets in the range of 300 mm - 2000 mm of distance from teh chessboard pattern. I usually took 30-50 images
This tab allows the user to run block matching on the stereo image stream and generate depth map for further use. It has a UI interface that helps to configure the BM for better results.
The majority of the following description is copied from this OpenCV answer
Block matching type - the app supports configuring both Block Matching (BM) and Semi-Global Block Matching (SGBM), check the links below if you want to learn more about each, also google is your friend
- Common Parameters
- Minimum disparity - is the smallest disparity value to search for. Use smaller values to look for scenes which include objects at infinity, and larger values for scenes near the cameras. Negative minDisparity can be useful if the cameras are intentionally cross-eyed, but you wish to calculate long-range distances. Setting this to a sensible value reduces interference from out of scene areas and reduces unneeded computation.
- Number of disparity - is the range of disparities to search over. It is effectively your scene's depth of field setting. Use smaller values for relatively shallow depth of field scenes, and large values for deep depth-of-field scenes. Setting this to a sensible value reduces interference from out of scene areas and reduces unneeded computation.
- Block size - is the dimension (in pixels on a side) of the block which is compared between the left and right images. Setting this to a sensible value reduces interference from out of scene areas and reduces unneeded computation.
- Uniqueness ratio - used in filtering the disparity map before returning to reject small blocks. May reduce noise.
- Speckle windows size, speckle range, disparity 12 max diff - used in filtering the disparity map before returning, looking for areas of similar disparity (small areas will be assumed to be noise and marked as having invalid depth information). These reduces noise in disparity map output.
- BM parameters
- Texture threshold - used in filtering the disparity map before returning. May reduce noise.
- prefilter type/size/cap - used in filtering the input images before disparity computation. These may improve noise rejection in input images.
- smallerBlockSize - in theory this should reduce noise, but I couldn't produce any effect
- SGBM parameters
P1, P2 - used in filtering the disparity map before returning to reject small blocks. May reduce noise.It is not settable in the UI anymore,but calculated using the block size.
I've got two solutions
- Using OpenGL (pyqt compatible)
It is a more sophisicated tool which allows high performance rendering. See the original example. I adapted it here to draw depthmap numpy arrays. - Using pyqtgraphs
It is a simple visualizer tool, that requires the depth map as an input and will generate a 3D point cloud representation. Check the original original scatterplot example.
There is also a minimum example if you are only interested in this widget and not the whole app. It is pyqt compatible.
Note: it uses pyqtgraph scatterplot. It is not a very fast way to represent, but good for initial development. - Parameters
- fov - sets the field of view
- samplingRatio - allows the user to change how many points to show. For example selecting 10, will show every 10th point only. Improves performance. For me > 200 settings was reasonable fast.
- ignore depth - Ignores drawing points that have a depth more than the limit.
Performance can be quite different with different resolution, hence I added the feature of changing it. Note: Blok Matching resolution MUST be the same as the calibration for appropriate results.
Since I am not really knowledgable in the area I basically did a lots of trial and error. tweaked each parameter to see what they result in the depth map. I feel, that if you get the calibration right, then block matching configuration with default values, will already show promosiing values.
-
My config was working the best with
- SGBM
- Min disparity: 7
- Num disparity: 112
- Block size: 5
- Uniqueness ratio: 4
- Speckle window size: 1
- Speckle range: 1
- Disp12MaxDiff: 1
- P1: 99
- P2: 999
-
Make sure the left and right cameras are not swapped, use the UI to swap them if they are.
-
I noticed that P1, and P2 had a major effect on the speckle reduction.
-
Check the link below, those guys had different setups. My values are generally an average of the other documented attempts
-
If you get nonsense or extreme amount of speckle and very little depth map with roughly similar config, then most likely your lenses are not well aligned or your chesspattern is not the most accurate
https://github.com/SoonminHwang/py-pointcloud-viewer
mmatl/pyrender#14
https://answers.opencv.org/question/182049/pythonstereo-disparity-quality-problems/
https://becominghuman.ai/stereo-3d-reconstruction-with-opencv-using-an-iphone-camera-part-i-c013907d1ab5
https://github.com/OmarPadierna/3DReconstruction
https://learnopencv.com/depth-perception-using-stereo-camera-python-c/
https://gist.github.com/aarmea/629e59ac7b640a60340145809b1c9013#file-2-calibrate-py
In order to get visual odometry working, features have to be detected and matched on two consecutive images. This allows to calculate navigation data based on only a few points of the images instead of every single point.
Again not an expert here. I copy pasted and adapted most of the cde from here. Note: This repo is a great tutorial for overall stereo calibration and vSLAM.
The tab shows the matches of two images captured by the same lense at different time.
There are only a few parameters present. In order to connect to the cameras and produce the matching, press Start
- Parameters
- Feature detector - allows the user to change the detector algorithm between sift, orb and surf. Note surf is not supported with BF matcher in this app. It will terminate with an exception.
- Feature matcher - allows the user to change the detector matching algorithm ("BF" Brute force and FLANN)
- Max allowed distance between best matches - Filters all the best matches that have a distance below this limit
https://www.youtube.com/watch?v=2GJuEIh4xGo
https://github.com/FoamoftheSea/KITTI_visual_odometry
https://scialert.net/fulltext/?doi=itj.2009.250.262
Based on the previous tab all essential elements are ready to estimate motion poses and generate a trajectory. This tab will allow the user to do minor tweaks in the estimation and visualize trajectory.
To gain more understanding please check the links below. In nutshell, based on the key points (feature points that were found during feature matching) will be projected in 3D space using the depth map. Using the resulted object points and the original image keypoints a pose is generated (rotation and translation). This is then used to build a trajectory.
In order to start motion estimation press Start.
The top graph shows the trajectory points in 3D space. The first graph in the second row is the depth map. Second graph is a line graph shows the depth (z) related to the cycle count. And the third graph shows the x,y coordinates.
- Parameters
- inliers - sets the limit of at least how many inliers to be found to accept the pose estimation. Using the homebrew stereo camera I got a lot of false poses, making jumps in the trajectory. Adding this limit reduces the number of jumps.
- Max depth - any keyp points (feature points) that are further than max depth shall be ignored. Certain depth wouldn't give a reliable object point for pose estimation.
- reprojectionError - max allowed distance between best matches during pose estimation
I haven't gotten too deep in the theory behind motion estimation, the jupyter notebook in the link below gives a really good summary.
- At this stage having syncronized cameras is a really good idea. I certainly have reliability issues during camera movement resulting in not finding enough matches, those skipping the pose estimation step. This of course will drift the trajectory.
- I introduced a filter in the tracejtory generation. The motion estimator can give outliers resulting spikes in the translation vector. I guess it has to do with my camera setup, which is admitedly not too reliable
https://github.com/FoamoftheSea/KITTI_visual_odometry
The final stage is creating the mapping of the environment. I kept this part to the absolute minimum, which is drawing the the keypoints (landmarks) in absolute coordinates. There can be a lot more improvements, like optimizing out duplicate keypoints, improve accuracy. But the current solution is enough for a proof of concept.
In order to start motion estimation press Start.