facebookresearch/frankmocap

Question about training the hand pose.

zhengsipeng opened this issue · 3 comments

Hi, thanks a lot for your great work.

Recently I'm working on a self-supervised model where I plan to use frankmpcap to extract hand pose as the pseudo label instead of using GT.
According to Eq(5) in your paper, the hand module loss is L=L_{theta}+L_{3D}+L_{2D}+L_{reg}. So if I want to use the same criterion as Eq(5) in my work, I assume I need to use the prediction of your hand module for supervision accordlingly like:
48-dim hand pose for L_{theta}'s label (pred_hand_pose);
10-dim for L_{reg}'s label (pred_hand_betas)
21x2 dim for L_{2D}'s label (pred_joints_img[:, :2])

But which output can I use for L_{3D}'s supervision? pred_joints_smpl or others? I notice that your hand module is 3D joint in smplx space -> 2D bbox -> 2D image, no 3D joints in image space are predicted.

@zhengsipeng You can use pred_joints_smpl to calculate L_{3D}.

@zhengsipeng You can use pred_joints_smpl to calculate L_{3D}.

Thanks for your reply.
So I guess I can also use pred_hand_pose for L_{pose} and pred_joints_img for L_{2D}, am I right?

@zhengsipeng You can use pred_joints_smpl to calculate L_{3D}.

Thanks for your reply. So I guess I can also use pred_hand_pose for L_{pose} and pred_joints_img for L_{2D}, am I right?

Yes.