About some hidden bugs in monocular image fitting and texture extraction, and corresponding solutions.
icewired-yy opened this issue · 0 comments
Some mistakes hit me when I was implementing the image fitting
procedure and texture extraction
procedure following the official demo.
I found that the rendered result which is shown side by side with the original image is not aligned with the original face. And also the extracted texture based on this fitted result is incorrect. And I will show some modifications here to help someone facing the same problem as me.
To begin with, my corresponding Python package settings are:
Python 3.9.18
dlib 19.24.2
Descriptions:
The result I get when processing chan.jpg
following official demo is:
landmark:
fitting result:
texture:
It is interesting that is model seems to fit the face in the image well, but why is the rendered result not aligned with the face?
I reviewed the official code and found that there are many Y-axis flipping operations in detecting landmarks and extracting texture. Such as some codes in facescape_fitter.py
:
def detect_kp2d(self, src_img):
...
elif self.kp2d_backend == 'dlib':
faces = self.detector(sc_img, 1)
pts = self.face_pred(sc_img, faces[0])
# Y-axis Flipping
kp2d = np.array([[p.x*fp_scale, src_img.shape[0] - p.y*fp_scale - 1] for p in pts.parts()])
return kp2d
And,
def get_texture(self, img, verts_img, mesh):
...
for face in mesh.faces:
face_vertices, face_normals, tc, material = face
...
# Y-axis Flipping
tri1 = np.float32([[[(h - int(verts_img[face_vertices[0] - 1, 1])),
int(verts_img[face_vertices[0] - 1, 0])],
[(h - int(verts_img[face_vertices[1] - 1, 1])),
int(verts_img[face_vertices[1] - 1, 0])],
[(h - int(verts_img[face_vertices[2] - 1, 1])),
int(verts_img[face_vertices[2] - 1, 0])]]])
tri2 = np.float32(
[[[4096 - self.texcoords[tc[0] - 1][1] * 4096, self.texcoords[tc[0] - 1][0] * 4096],
[4096 - self.texcoords[tc[1] - 1][1] * 4096, self.texcoords[tc[1] - 1][0] * 4096],
[4096 - self.texcoords[tc[2] - 1][1] * 4096, self.texcoords[tc[2] - 1][0] * 4096]]])
r1 = cv2.boundingRect(tri1)
r2 = cv2.boundingRect(tri2)
...
return texture
These operations seem to make the fitted mesh result more compatible with pyrender
process. But in my case, it leads to an incorrect result.
Solutions
I made some modifications, which cancel the Y-axis flipping:
def detect_kp2d(self, src_img):
...
elif self.kp2d_backend == 'dlib':
faces = self.detector(sc_img, 1)
pts = self.face_pred(sc_img, faces[0])
# modified
kp2d = np.array([[p.x * scale, p.y * scale] for p in pts.parts()])
return kp2d
def get_texture(self, img, verts_img, mesh):
...
for face in mesh.faces:
face_vertices, face_normals, tc, material = face
...
# Y-axis Flipping
tri1= np.float32([[[int(verts_img[face_vertices[0] - 1, 1]),
int(verts_img[face_vertices[0] - 1, 0])],
[int(verts_img[face_vertices[1] - 1, 1]),
int(verts_img[face_vertices[1] - 1, 0])],
[int(verts_img[face_vertices[2] - 1, 1]),
int(verts_img[face_vertices[2] - 1, 0])]]])
tri2 = np.float32(
[[[4096 - self.texcoords[tc[0] - 1][1] * 4096, self.texcoords[tc[0] - 1][0] * 4096],
[4096 - self.texcoords[tc[1] - 1][1] * 4096, self.texcoords[tc[1] - 1][0] * 4096],
[4096 - self.texcoords[tc[2] - 1][1] * 4096, self.texcoords[tc[2] - 1][0] * 4096]]])
r1 = cv2.boundingRect(tri1)
r2 = cv2.boundingRect(tri2)
...
return texture
Also, in some cases, the cropped image will have a zero-width or zero-height in get_texture
, so we can add a judgment before using the img1Cropped
, to make the code more robust:
# Apply warpImage to small rectangular patches
img1Cropped = img[croppedRectangleInImage[0]:(croppedRectangleInImage[0] + croppedRectangleInImage[2]), croppedRectangleInImage[1]:(croppedRectangleInImage[1] + croppedRectangleInImage[3])]
# ADD THIS: Check if image is empty, in some cases the cropped image may be empty
if img1Cropped.shape[0] == 0 or img1Cropped.shape[1] == 0:
continue
warpMat = cv2.getAffineTransform(np.float32(triangleVertexPixelCoordInCroppedImage), np.float32(triangleVertexPixelCoordInCroppedTexture))
# Get mask by filling triangle
...
After modifications:
landmark detecting:
texture extraction:
For aligning the rendered result with the face in the original image, we only need to add a Y-axis flipping to mesh vertices:
...
orthoRenderer = OrthoRenderModule(resolution=img.shape)
mesh_trimesh = trimesh.Trimesh(vertices=mesh.vertices.copy(),
faces=fitModule.faceScapeBilinearModel.fv_indices_front.copy() - 1,
process=False)
mesh_trimesh.vertices[:, :2] = mesh_trimesh.vertices[:, :2] - np.array([img.shape[1] / 2, img.shape[0] / 2])
mesh_trimesh.vertices = mesh_trimesh.vertices / img.shape[0] * 2
mesh_trimesh.vertices[:, 2] = mesh_trimesh.vertices[:, 2] - 10
# Add Y-axis flipping operation here:
mesh_trimesh.vertices[:, 1] = -mesh_trimesh.vertices[:, 1]
renderedImage, renderedDepth = orthoRenderer(mesh_trimesh, (1, 1), zNear=0.05, zFar=10000)
....