buaavrcg/BakedAvatar

Regarding the mesh loaded during testing

felixshing opened this issue · 7 comments

Hello, I would like to ask a question regarding the mesh loaded during the testing phase. I notice that during testing, the RasterizationModel will load a mesh. This mesh is the output of training. Then, this model can take the testing data as the input to render the final results here

I would like to ask what is this trained mesh and what is its function?

Hi, the trained mesh file is the multi-layer meshes the method produce in the end, each mesh consists of the vertices, faces, UVs, and the texture maps. All of these are recorded in the mesh_data.pkl file. During fine-tuning, the trainer will load exported meshes and textures in the baking stage and fine-tune the textures on it.

Thank you for your reply! I understand that the trained mesh file is the multi-layer mesh the method produces in the end. What I am not so sure about is that the trained mesh file is the multi-layer mesh the method produces in the end. Originally, I thought they stood for the meshes of each frame in the training set. But then I realized that only a piece of eight-layer mesh was produced. Is it the canonical mesh or something else? Also, during testing, the code loads this mesh before loading the testing frame (as mentioned in the first post). Why do we need to do that?

Thank you for your time.

Is it the canonical mesh or something else?

Yes, there is only one multi-layer mesh in the file, which is the mesh in the canonical space. This mesh is shared by all frames.

Also, during testing, the code loads this mesh before loading the testing frame (as mentioned in the first post). Why do we need to do that?

I am not sure I understand what you mean. We need to load the mesh for rendering, exactly like what we are doing in the fine-tuning phase.

Yes, there is only one multi-layer mesh in the file, which is the mesh in the canonical space. This mesh is shared by all frames.

Thank you for your reply. If the training output is a canonical mesh, then everything makes sense. Basically during training, we need to load the canonical mesh first, then conduct deformation on this mesh based on the input FLAME expression and pose. Is this correct?

Moreover, I would like to ask, if I want to train and test on the datasets collected by myself, similar to I M Avatar, I only need to 1) FLAME parameter estimation from DECA; 2) Background segmentation with ModNet; and 3) Landmark from face alignment, without the need to run semantic segmentation and iris estimation. Is my understanding correct?

Thank you for your reply. If the training output is a canonical mesh, then everything makes sense. Basically during training, we need to load the canonical mesh first, then conduct deformation on this mesh based on the input FLAME expression and pose. Is this correct?

Yes, you are right.

Moreover, I would like to ask, if I want to train and test on the datasets collected by myself, similar to I M Avatar, I only need to 1) FLAME parameter estimation from DECA; 2) Background segmentation with ModNet; and 3) Landmark from face alignment, without the need to run semantic segmentation and iris estimation. Is my understanding correct?

I think the semantic segmentation result is somewhere used in the loss, to determine loss weights for different facial parts. Without these weights, it can still be trained well, but it might lead to a slightly different result.

I think the semantic segmentation result is somewhere used in the loss, to determine loss weights for different facial parts. Without these weights, it can still be trained well, but it might lead to a slightly different result.

Yeah but despite this, seems inference/rendering requires only the pose and expression. Is my understanding correct?

Yes, you are correct.