About some hidden bugs in monocular image fitting and texture extraction, and corresponding solutions.

Question

About some hidden bugs in monocular image fitting and texture extraction, and corresponding solutions.

icewired-yy opened this issue a year ago · 0 comments

Some mistakes hit me when I was implementing the image fitting procedure and texture extraction procedure following the official demo.
I found that the rendered result which is shown side by side with the original image is not aligned with the original face. And also the extracted texture based on this fitted result is incorrect. And I will show some modifications here to help someone facing the same problem as me.

To begin with, my corresponding Python package settings are:

Python 3.9.18
dlib 19.24.2

Descriptions:

The result I get when processing chan.jpg following official demo is:
landmark:

fitting result:

texture:

It is interesting that is model seems to fit the face in the image well, but why is the rendered result not aligned with the face?

I reviewed the official code and found that there are many Y-axis flipping operations in detecting landmarks and extracting texture. Such as some codes in facescape_fitter.py:

    def detect_kp2d(self, src_img):
        ...

        elif self.kp2d_backend == 'dlib':
            faces = self.detector(sc_img, 1)
            pts = self.face_pred(sc_img, faces[0])
            # Y-axis Flipping
            kp2d = np.array([[p.x*fp_scale, src_img.shape[0] - p.y*fp_scale - 1] for p in pts.parts()])  

        return kp2d

And,

    def get_texture(self, img, verts_img, mesh):

        ...

        for face in mesh.faces:
            face_vertices, face_normals, tc, material = face

            ...
            # Y-axis Flipping
            tri1 = np.float32([[[(h - int(verts_img[face_vertices[0] - 1, 1])),
                                    int(verts_img[face_vertices[0] - 1, 0])],
                                [(h - int(verts_img[face_vertices[1] - 1, 1])),
                                    int(verts_img[face_vertices[1] - 1, 0])],
                                [(h - int(verts_img[face_vertices[2] - 1, 1])),
                                    int(verts_img[face_vertices[2] - 1, 0])]]])
            tri2 = np.float32(
                [[[4096 - self.texcoords[tc[0] - 1][1] * 4096, self.texcoords[tc[0] - 1][0] * 4096],
                    [4096 - self.texcoords[tc[1] - 1][1] * 4096, self.texcoords[tc[1] - 1][0] * 4096],
                    [4096 - self.texcoords[tc[2] - 1][1] * 4096, self.texcoords[tc[2] - 1][0] * 4096]]])
            r1 = cv2.boundingRect(tri1)
            r2 = cv2.boundingRect(tri2)

        ...

        return texture

These operations seem to make the fitted mesh result more compatible with pyrender process. But in my case, it leads to an incorrect result.

Solutions

I made some modifications, which cancel the Y-axis flipping:

    def detect_kp2d(self, src_img):
        ...

        elif self.kp2d_backend == 'dlib':
            faces = self.detector(sc_img, 1)
            pts = self.face_pred(sc_img, faces[0])
            # modified
            kp2d = np.array([[p.x * scale, p.y * scale] for p in pts.parts()])

        return kp2d

    def get_texture(self, img, verts_img, mesh):

        ...

        for face in mesh.faces:
            face_vertices, face_normals, tc, material = face

            ...
            # Y-axis Flipping
            tri1= np.float32([[[int(verts_img[face_vertices[0] - 1, 1]),
                                         int(verts_img[face_vertices[0] - 1, 0])],
                                         [int(verts_img[face_vertices[1] - 1, 1]),
                                          int(verts_img[face_vertices[1] - 1, 0])],
                                         [int(verts_img[face_vertices[2] - 1, 1]),
                                          int(verts_img[face_vertices[2] - 1, 0])]]])
            tri2 = np.float32(
                [[[4096 - self.texcoords[tc[0] - 1][1] * 4096, self.texcoords[tc[0] - 1][0] * 4096],
                    [4096 - self.texcoords[tc[1] - 1][1] * 4096, self.texcoords[tc[1] - 1][0] * 4096],
                    [4096 - self.texcoords[tc[2] - 1][1] * 4096, self.texcoords[tc[2] - 1][0] * 4096]]])
            r1 = cv2.boundingRect(tri1)
            r2 = cv2.boundingRect(tri2)

        ...

        return texture

Also, in some cases, the cropped image will have a zero-width or zero-height in get_texture, so we can add a judgment before using the img1Cropped, to make the code more robust:

            # Apply warpImage to small rectangular patches
            img1Cropped = img[croppedRectangleInImage[0]:(croppedRectangleInImage[0] + croppedRectangleInImage[2]), croppedRectangleInImage[1]:(croppedRectangleInImage[1] + croppedRectangleInImage[3])]
            
            # ADD THIS: Check if image is empty, in some cases the cropped image may be empty
            if img1Cropped.shape[0] == 0 or img1Cropped.shape[1] == 0:
                continue
            
            warpMat = cv2.getAffineTransform(np.float32(triangleVertexPixelCoordInCroppedImage), np.float32(triangleVertexPixelCoordInCroppedTexture))

            # Get mask by filling triangle
            ...

After modifications:
landmark detecting:

texture extraction:

For aligning the rendered result with the face in the original image, we only need to add a Y-axis flipping to mesh vertices:

    ...
    orthoRenderer = OrthoRenderModule(resolution=img.shape)
    mesh_trimesh = trimesh.Trimesh(vertices=mesh.vertices.copy(),
                                   faces=fitModule.faceScapeBilinearModel.fv_indices_front.copy() - 1,
                                   process=False)

    mesh_trimesh.vertices[:, :2] = mesh_trimesh.vertices[:, :2] - np.array([img.shape[1] / 2, img.shape[0] / 2])
    mesh_trimesh.vertices = mesh_trimesh.vertices / img.shape[0] * 2
    mesh_trimesh.vertices[:, 2] = mesh_trimesh.vertices[:, 2] - 10
    # Add Y-axis flipping operation here:
    mesh_trimesh.vertices[:, 1] = -mesh_trimesh.vertices[:, 1]

    renderedImage, renderedDepth = orthoRenderer(mesh_trimesh, (1, 1), zNear=0.05, zFar=10000)
    ....

Now the alignment is correct: