AR-tag-detection: A Python repository from harshkakashaniya

AR Tag Detection and Tracking

The project focuses on detecting a custom AR Tag (a form of fiducial marker), that is used for obtaining a point of reference in the real world, such as in augmented reality applications. The two aspects to using an AR Tag: detection and tracking, has been implemented in this project. Following are the 2 stages:

Detection: Involves finding the AR Tag from a given image sequence
Tracking: Involves keeping the tag in “view” throughout the sequence and performing image processing operations based on the tag’s orientation and position (a.k.a. the pose).

Prior to the implementation of image processing on the image sequence, the video is split into its image frames using cv2.VideoCapture, and once the operations are performed on each of the frames, it is appended into an array. This image array is then used to get the video back using cv2.VideoWriter

Following is the reference AR Tag that has to be detected and tracked:

Edge Detection of AR tag

AR Tags facilitate the appearance of virtual objects, games, and animations within the real world. The analysis of these tags can be done as followed.

The tag has been decomposed into an 8*8 grid of squares, which includes a padding of 2 squares width along the borders. This allows easy detection of the tag when placed on white background.
The inner 4*4 grid (i.e. after removing the padding) has the orientation depicted by a white square in the lower-right corner. This represents the upright position of the tag. This is different for each of the tags provided in the different image sequences.
Lastly, the inner-most 2*2 grid (i.e. after removing the padding and the orientation grids) encodes the binary representation of the tag’s ID, which is ordered in the clockwise direction from least significant bit to most significant. So, the top-left square is the least significant bit, and the bottom-left square is the most significant bit.

Following are the AR Tags obtained on doing Perspective Transformation on image sequences.

The process of the edge and corner detection has been implemented in the following way:

The video stream is first converted into image frames.
Detection is performed on each of the frames and then taking the fps as 25, the video is formed again.
In the code, the function Edgedetecion has been scripted which takes in the image, the old corners that can been computed and returns the new coordinates of the detected corners. This is repeated for each image frame.
The Computer Vision methods made use of in the above implementation are:
- cv2.cvtColor
- cv2.medianBlur
- cv2.threshold
- cv2.findContours
- cv2arcLength
- cv2.approxPolyDP
- cv2.contourArea
- cv2.isContourConvex
Once the Corners are successfully detected, the perspective transformation of the Tag is performed. The function perspective for tag has been scripted to get the transformed and resized tag image. The methods used for the transformation are:
- cv2.findHomography
- cv2.warpPerspective
- cv2.resize
After the above successful transformation the ID of the tag is obtained, the corners of the tag as well as its ID with respect to its original orientation i.e compensated for any camera rotation is obatined.
The above process has been written in Python using several functionalities of Computer Vision such as Contours and Corner Detection algorithms.
Thus this part of the project successfully identifies and detects the AR tag in any given image sequence.

Screenshot of Tag Identification and Edge Detection Output

Superimposing of image on AR tag

Homographic transformation is basically the transformation between two planes in projective space. In augmented reality, it is the transformation that projects a square marker board from the world coordinate system into a polygon in the image plane on the camera sensor (i.e., pixel coordinates). In this project, once the 4 corners of the tag is obtained with the Edge and corner detection done above, homography estimation is performed. This includes superimposing the image over the tag eg. the following image of lena.jpg

The following steps has been successfully implemented in this part of the project:

Homography between the corners of the template and the four corners of the tag has been computed using the corners of the AR tag detected in the first part of the project. The homography matrix has been manually calculated in the code by finding the SVD of the world and camera coordinates system. The SVD function of the Linear Algebra library from the Numpy library has been used to calculate the same.
Once the transformation of the template image has been sucessfully done above, tag has been done, the tag is “replaced” with the transformed image.
This also implies that the orientation of the transformed template image must match that of the tag at any given frame, which has been taken care using efficient corner, contour and edge detection techniques to get the correct orientation of the AR tag at any given time.
The above has been achieved using the calculated homography matrix to get the cv2.warpPerspective along with the CV function cv2.fillConvexPoly.

Output Images:

Following are the screenshots of the superimposed images of Lena.png on the AR Tag taken during the course of the video. The images show that the superimposed image maintains its orientation, thus compensating for the rotation of the camera.

Placing virtual 3D Cube on AR tag

Here the process of “projecting” a 3D shape onto a 2D image has been implemented.

The homography between the world coordinates (the reference AR tag) and the image plane (the tag in the image sequence) is first computed. The calculation of this has been done by detecting the four corners of the marker in the input image captured by camera i.e the true corners for the UPRIGHT orientation of marker).
The world coordinates of the corners is determined and the homography is computed between the detected corner points in the image (pixel coordinates) and the corresponding points in the reference (world) coordinate system.
The projection matrix is then built from the given camera calibration matrix provided and the homog- raphy matrix. Here the camera’s pose i.e the rotation matrix R and translation vector t is found as shown in the code, and thus the projection matrix, P = K[R|t] is constructed.
Assuming that the virtual cube is sitting on “top” of the marker, and that the Z axis is negative in the upwards direction, we have obtained the coordinates of the other four corners of the cube. This allows the transformation of all the corners of the cube onto the image plane using the projection matrix.
The cube is then drawn using cv2.drawContours

The implementation of all of the above steps has been clearly shown in the code. The following screenshot of video shows the Placement of the virtual 3D cube on the AR Tag.

Conclusion

Implementation of the concept of AR detection and tracking helped in the understanding of several concepts and a good knowledge of the challenges and appreciation of the maths behind digital images and cameras. The practical implementation of the transformation between the world and camera coordinates was exciting as we saw the beauty of linear algebra work. We understood the concept of homography in computer vision and the use of its libraries in python. The process of Edge and Corner Detection required several repeated optimization techniques to get the best result. This helped us understand the concept of Contours and various detection algorithms.

Authors

Koyal Bhartia(Koyal-Bhartia) - Graduate Student in University of Maryland with majors in Robotics. Interested in Computer Vision and Machine Learning of Autonomous robots

Harsh Kakashaniya(harshkakashaniya) - Graduate Student in University of Maryland with majors in Robotics. Interested in Medical, Mobile and Agriculture robotics.