Does fitting like smplifyx still needed for TCMR?
lucasjinreal opened this issue · 13 comments
https://github.com/vchoutas/smplify-x
Like smplify-x, it will using a traning loop to fitting the shape, does it still needed such a step for TCMR since it already output 3d keypoints and shape. and camera.
Hi @jinfagang,
I don't think TCMR needs another optimization loop using smplify-x.
For example, smplify-x requires 2D joint location for fitting, but 2D pose may be inaccurate in some video frames.
Thus, additional fitting on TCMR 3D meshes could have adverse effects.
I don't think TCMR does not need another optimization loop using smplify-x.
You mean, TCMR does need another optimization loop like smplify-x? It fitting shape by projecting 3d back to 2d to minimal errors.
They do that becaused it can make 3d models more fit on image with exact person locations.
what do you think?
My mistake. I don’t think TCMR needs another optimization.
@hongsukchoi thanks, that make sense, from my experience, even add another fitting with loss propagation doesn seems improve anything.
I still have one last question about the code.
TCMR is quite simple, just detection and then tracking, then take features send to temporal model produce 3d pose and cam shape predictions (correct me if am wrong).
But where does this 2djoint come from?
and what's more importantly, it seems bboxes were overrided and doesn't using any 2djoint information.
So, it is TCMR experimented with it and finally doesn't take it as final solution?
@jinfagang ,
The TCMR codes including the demo code are based on VIBE, which TCMR improved.
The 2D joints are the legacy variables of VIBE and I don't use them. But I left them because some people who know VIBE might use them.
@hongsukchoi thanks for clarifying. Do u think joints2d will improve accuracy? Does it necessary?
It depends on the definition of accuracy.
I think there are three accuracy.
- Per-frame accuracy (= 3D joint accuracy)
- Temporal accuracy (=naturalness of motion = acceleration error)
- Rendering accuracy
Using "joints2d" may increase 1. Per-frame accuracy. But the 2. Temporal accuracy is likely to be decreased, since the new fitted meshes per frame may be temporally inconsistent. We can have a video chat for further discussion.
Using "joints2d" may increase 3. Rendering accuracy. TCMR's meshes are temporally consistent, but when rendered, the overall size (scale) of meshes vary sometimes. It's because the predicted camera projection parameters vary. The 2D joints may keep the meshes' overall scale consistent, if the 2D joints are accurate.
@hongsukchoi thanks. What you said is using kyp2d as TCMR input or using kyp2d as GT for another loss based smpl regression for final mesh?
BTW, I noticed that some people using kyp2d to get boxes, and then crop image send to TCMR as input, do you think this is reasonable? Since I think kyp2d might not as accurate as box does. If kyp2d fails, the boxes get from it wil fail. And the whole windows send to TCMR might effected.
What you said is using kyp2d as TCMR input or using kyp2d as GT for another loss based smpl regression for final mesh?
The 2D joints can be used for the post-processing on predicted meshes, leveraging fitting methods like SMPLify.
I suppose it's closer to the latter one you said.
BTW, I noticed that some people using kyp2d to get boxes, and then crop image send to TCMR as input, do you think this is reasonable? Since I think kyp2d might not as accurate as box does. If kyp2d fails, the boxes get from it wil fail. And the whole windows send to TCMR might effected.
kyp2ds from bottom-up 2D pose estimators may be better. Actually, conventional object detectors are also noisy and tend to capture only visible areas. But, the bottom-up 2D pose estimators can predict on invisible areas. Also, they tend to be better on identifying the target person in the crowded scene. You may checkout my another paper if you are further interested. 3DCrowdNet: https://arxiv.org/abs/2104.07300
@hongsukchoi thanks. In terms of box trend to miss unvisiable area dependes on how the box annotated in the begaining. However, I agree with you that if we directly using bottom up keypoints model and using the kyp2d to generate box for TCMR use would be much more properly since it will eliminated the first detection model.
Did u read PyMAF paper? How do think it's performance compare with TCMR? what could be that methods weekness?
I read it briefly. TCMR's contribution is on learning good temporal features from video frames and PyMAF assumes a single frame input. So I think directly comparing TCMR and PyMAF is inappropriate.
Though if you want my opinion on the PyMAF, I think the method is not intuitive. It proposes a Mesh Alignment Feedback (MAF). But it does not guarantee that the mesh alignment get better in each step. Also, the accuracy performance on 3DPW is not as good as recent SOTA methods.
@hongsukchoi thanks. Can I add your wechat? I really want talk a little bit more on this field. I am currently focus on make most sota models come to industry and integrate into some products serves many users.
Making the models come to industry sounds awesome! I am sure it's possible, since most of the recent models assume a monocular input.
Unfortunately, I don't have wechat. I use KakaoTalk for the messanger.
I sometimes use Facebook: https://www.facebook.com/redstone.q/
And I respond to e-mails quickly than the github. My e-mail: redarknight@snu.ac.kr