microsoft/MeshTransformer

[Code] Questions on the loss details

jhcho99 opened this issue · 2 comments

Hello!
Thanks for the great work :)

I have two questions on the loss details in run_metro_bodymesh.py.

  1. According to the paper, L1 loss is used to minimize the error between the predicted joints and ground-truth joints. But, torch.nn.MSELoss() is used in run_metro_bodymesh.py instead of L1 loss. Why do you use MSE loss instead of L1 loss described in the paper?
    image

  2. Why do you subtract the location of pelvis twice? During training, line 207-211 in run_metro_bodymesh.py computes the subtraction of the pelvis location from the given annotation
    image
    , but line 125-128 in run_metro_bodymesh.py also computes the subtraction of the pelvis location from the modified annotation obtained by the computation in line 207-211.
    image

Could you please check the above questions?
Thanks!

Thanks for pointing out the questions!

Q&A1: Sorry for the inconsistent descriptions in the paper/code. When we implement the loss functions, we mainly adopt the loss functions from GraphCMR (CVPR 2019). As the authors (CVPR 2019) discussed in their paper, they empirically found L1 loss gives more stable training compared to MSE loss. However, in their implementation, they actually use MSE loss for the 2D/3D joints, and L1 loss for the 3D vertices. Probably replacing MSE with L1 could further improve our training, but we haven't tried it.

Q&A2: In line 207-211, we are trying to prepare GT data, and we normalize 3D GT joints based on a pre-defined pelvis. When we compute loss in line 125-128, we want to make sure both prediction and ground truth are in the same 3D space, so we normalize them again based on their pelvis (which is computed on the fly by (left_hip+right_ hip)/2).

Thanks for the detailed explanation about my questions!
I greatly appreciate your help :)