Given a source image and an empty billboard image (target), the code should put the source image on the empty billboard of the Time Square automatically.
Clone the repository to your machine using the following command
git clone "https://github.com/ojaashampiholi/Projective-Transformation-Stereo-Matching.git"
Then change the directory to access the files as follows
cd Projective-Transformation-Stereo-Matching/
To test the application on command line, run using the below command followed by annotation of the target image.
python homography.py source_image_name target_image_name
First command line argument would be the name of the source image and the next argument would be the name of the target image.
Once the program runs, the source image is displayed on a window. Upon closing the source image window, the target image window appears. Annotate the four corners of the target image (by clicking on the desired points using mouse) and close the window. While annotating, make sure to start from the top left corner of the empty bill board and continue clockwise. Finally, the output window opens up with the warped source image on the bill board.
1.The input source code and target images are read and displayed.
2.Four corners on the target image are annotated by mouse clicks received from the user. The annotated target image and the coordinates of the annotated corners are saved.
3.The source image corners are defined from starting at the top left corner marked as (0,0) and continuing clockwise. Second pair would be (width,0) on the top right corner, third pair would be (width,height) on the bottom right corner and (0,height) on the bottom left corner. It is important to maintain the order while annotating and defining the corner points in the source image as this preserves the one to one point correspondence between the two images in the right way.
4.Estimating Homography Matrix:
4a.Defining matrix A: The homography matrix H is a 3x3 matrix with 9 unknowns. This matrix can be computed with a matrix system such that the four pairs of correspondance points are written as 2×9 matrices such as:
[[-xi, -yi, -1, 0, 0, 0, xi*ui, yi*ui, ui],
[0, 0, 0, -xi, -yi, -1, xi*vi, yi*vi, vi]]
where (x,y) point pair belongs to the source image and (u,v) point pair belongs to target image. The above matrix form is achieved by the following steps: Write out linear equation for each point correspondence
Expand matrix multiplication
Eliminate scale factor
Re-arrange the terms
Re-write in matrix form
Four 2x9 matrices are defined as above one for each pair of points. Thus, after stacking them together for multiple point correspondences, we get the matrix A of shape 8x9. Since there are 9 unknowns in the H matrix, we can appened a row of zeros with the last element one to the matrix A and make it 9x9.
4b.Least Square method: To solve the system of linear equations of the form Ah = 0, we have used Eigenvalues and Eigenvectors. We applied SVD on matrix A, and have taken the eigen vector corresponding to the minimum eigen value. This eigenvector of shape (91) when reshaped to 33 matrix gives the Homography matrix for our problem.
5.Applying Homography matrix to the source image: For every pixel in the source image, we can compute the projected coordinates p^(x^,y^) of any point p(x,y) such as:
The homogeneous coordinates need to be converted to cartesian coordinates. Upon dividing the first two entries of p^ by the third coordinate, we can get the projected coordinates p^(x^,y^) in the cartesian form. The pixel values of the source image at p(x,y) are pasted on the target image at the corresponding projected coordinate p^(x^,y^). The resultant image will have the warped source image on the target image.
Reads the source and target images using the path provided while running the code
Displays the source and target images
Mouse event is triggered as and when a new point is clicked on the target image and the method annotate_image() gets called to mark the selected point on the image
This method is where the source image corners are defined, matrix A of the above format is defined.
This method is where the SVD of matrix A is done, smallest eigen vector is found to estimate the homography matrix.
Here, for every pixel in the source image, we calculate the projected coordinate and convert them from homogeneous form to cartesian form and paste the pixel value on the target image.
The Empty Billboard Image is
The Minion Image is
The Output Image is
Use two rectified images as the inputs to estimate the depth map of the scene and compare the results quantitatively and qualitatively with ground truth depth map provided for a pair of input images.
- Clone the repository to your machine using the following command:
git clone "https://github.com/ojaashampiholi/Projective-Transformation-Stereo-Matching.git"
- Change the directory to access the files as follows:
cd Projective-Transformation-Stereo-Matching/
- To test the application on the images use the following code:
python code_filename.py left_image right_image gt_image
For example:
python StereoMatching.py 1/0015_rgb_left.png 1/0015_rgb_right.png 1/0015_gt.png
python StereoViterbi.py 1/0015_rgb_left.png 1/0015_rgb_right.png 1/0015_gt.png
• Input: Images from both left and right camera are taken as the input by the program along with the ground truth depth map.
• All the images are converted to grayscale, this step is done to increase the computational speed.
• If input image size is found larger than certain threshold level, image resizing is done, which helps to boost computational speed.
• Two types of scoring schemes have been used here to compute depth map, specifically, Sum of Squared Differences(SSD) and Cross Correlation(cor).
• The sharpness and smoothness of depth map depends on Window Size and Maximum Offset Levels that can be tuned as per use case.
• Output: The computed depth map is saved as output image.
• Performance Metrics: The ground truth and computed depth map are used to compute the end point error and error rate. These measures show how well application performs on input image pairs for depth estimation.
To improve upon the current technique, a dynamic programming based algorithm, Viterbi Algorithm, has been implemented. It uses the values from it neighboring pixels to find the depth of current pixel.
This method takes input image and resizing factor as input. The resizing factor must be a number between (0.25 - 1) where 0.25 implies that image is reduced to 1/4th of its original size and 1 implies no change in the image size. Resized image is given as output.
This method takes left and right input images along with row, column, window size and offset information as input and computes the sum of squared differences between left and right input images which is returned as output. The formula for the same has been shown below:
This method takes left and right input images along with row, column, window size and offset information as input and computes the cross correlation between left and right input images which is returned as output. The formula for the same has been shown below:
This method takes left and right images as input along with window size, maximum offset, and type of scoring method to be used. If the scoring type is ‘ssd’, then sum of squared difference scoring is used. If the scoring type is ‘cc’, then cross correlation scoring is used. Offset factor is calculated as 255 / maxOffset. This is done to ensure that pixel values in depth map always lie between 0 and 255.
For each pixel in the left image, all the pixels in the corresponding window along with offset are compared from the right image. The offset of pixel with the least score is chosen as the output offset level. This offset is multiplied by the offset factor to get the corresponding pixel value for the depth map from the calculated disparity. The depth map is returned as output by this method.
This function takes depth map and ground truth images as input and calculates the end point error between images which is returned as output. The implementation uses following formula:
This function takes depth map and ground truth images as input and calculates the error rate between images which is returned as output. The implementation uses following formula:
This method is the main method which implements all the steps mentioned in Algorithm using the above support functions.
Overall, the window based matching using SSD and cross correlation captured the depths of the images, but produced a noisy depth map, whereas viterbi algorithm produced a smoother depth map which looked much closer to the ground truth. The outputs for all the given images have been added to the repo. But here are some qualitative and quantitative resutls.
1.Cross Correlation
2.SSD
3.Viterbi
Evident from the results, the algorithm do capture some depth information, but viterbi shows great improvement on the results.