hobbitsyfeet/PeopleTracker

2D to 3D Calibration

Opened this issue · 0 comments

Methods:

We calibrated each camera with sample videos and a checkerboard.

A room may have many videos where the camera is turned on and off so there may be some shift in the scene. We correct that by:

  1. Undistorting the videos using the best calibration for the room above ("measured" by the calibration which returns the straightest lines in the room)

  2. All videos are sampled and the scenes are stitched together using cv2.goodFeaturesToTrack with HarrisDetector. This method focuses specifically on corners/intersections of edges since rooms have a lot of these.

  3. Corners can exist outside of the frame or occluded by walls from the perspective. A well-calibrated image will have straight lines and can be estimated using linear regression (np. poly fit, degree=2). Poorly calibrated images can use a higher degree but in our case, the lines fit well. These points are gathered by performing regression along 4 manually labeled points along an edge of a desired wall, 2 intersecting edges will extend the image with a black border and draw the intersection point.

  4. These points will allow us to define a perspective grid. The 4 corners will be selected and a homography will be calculated to fit the four corners to a grid. This superimposes a grid that you can align with the room edges, walls, and floor features.

  5. The grid is then used to select the 2D position of the real-world points. This allows us to accurately grab corners even if they are occluded or past the camera's perspective view. We can also increase the resolution of the grid to select other points, for example, we can split the room into 1/10ths.

  6. The real-world points and the 2D selected points then are calculated using cv2.solvePnP() which creates a translation matrix. This matrix is then used to transform real-world locations onto the screen (2D), or inversely transform 2D points into a 3D location with a defined height. A vector of possible answers can exist between a range of heights.

Our issue is the solvePnP provides poor coordinates when projecting the room locations into our 2D locations. This has been checked by re-defining the real-world coordinates to account for measurement errors but it just seems to maintain that distortion. We have explored all the SolvePnP methods, I forget what the outcome was but none solved the issue:

            retval, r1, t1 = cv2.solvePnP(room_points.astype(np.float64), self.corners.astype(np.float64), self.camera_matrix, None, cv2.SOLVEPNP_IPPE, useExtrinsicGuess=False)
            retval, r2, t2 = cv2.solvePnP(room_points.astype(np.float64), self.corners.astype(np.float64), self.camera_matrix, None, cv2.SOLVEPNP_EPNP, useExtrinsicGuess=False)
            retval, r3, t3 = cv2.solvePnP(room_points.astype(np.float64), self.corners.astype(np.float64), self.camera_matrix, None, cv2.SOLVEPNP_ITERATIVE, useExtrinsicGuess=False)
            retval, r4, t4 = cv2.solvePnP(room_points.astype(np.float64), self.corners.astype(np.float64), self.camera_matrix, None, cv2.SOLVEPNP_MAX_COUNT, useExtrinsicGuess=False)
            retval, r5, t5 = cv2.solvePnP(room_points.astype(np.float64), self.corners.astype(np.float64), self.camera_matrix, None, cv2.SOLVEPNP_UPNP, useExtrinsicGuess=False)

            retval, r1, t1, inliers = cv2.solvePnPRansac(room_points.astype(np.float64), self.corners.astype(np.float64), self.camera_matrix, None)