IDEA-Research/OSX

What is meaning "wo_decoder"

Closed this issue · 4 comments

Hi, I'm jin.

I'm curious about expression of pretrained models.

About, osx_l_wo_decoder.pth.tar
It seems that I haven't found the answer I was looking for in response #19... Does the file named "osx_l_wo_decoder.pth.tar" refer to a model trained without the decoder on the OSX model, which has been pretrained on the MPII, MSCOCO-Wholebody, and H36M datasets, and further trained on the U-body dataset?

And could you please, why this result be made? I just tried demo.py with arbitraty data from Upper body data set

original image is here,
image

and, with this command, result is good.
python demo.py --gpu 0 --img_path /workspace/jin/OSX/dataset/UBody/images/Singing/Singing_S1_Trim2/Singing_S1_Trim2_scene003/000020.png --output_folder /workspace/jin/OSX/demo --decoder_setting wo_decoder --pretrained_model_path /workspace/jin/OSX/pretrained_models/osx_l_wo_decoder.pth.tar

image

but, i change the command with this , result was bad..
python demo.py --gpu 0 --img_path /workspace/jin/OSX/dataset/UBody/images/Singing/Singing_S1_Trim2/Singing_S1_Trim2_scene003/000020.png --output_folder /workspace/jin/OSX/demo --decoder_setting wo_decoder --pretrained_model_path /workspace/jin/OSX/pretrained_models/osx_l_agora.pth.tar

image

I found that i use command with " --decoder _setting wo_decoder" even use pretrained model "osx_l_agora.pth.tar" i anticipated that, with agora pretrained model. gonna be upgraded result. but result is worse than before.

Hi, there exists a domain gap between AGORA dataset and in-the-wild images. osx_l_agora.pth.tar is fine-tuned on AGORA dataset and thus could not perform well on the in-the-wild images.

I'm sorry. I think I read about this in a paper, but I missed it and asked again.

And i'm very curious about --decoder_setting wo_decoder

because, When i see a overall architecture, Encoder find Body pose, shape, camera transition and Decoder find Hands/Face pose.
Then if i train model with wo_decoder flag, then i can't train Hand and Face part, then how can evaluation result be shown?
Instead, I wonder if it should be trained with smpl, not smplx, and tested on non expressive benchmarks like 3DPW.

Hi, the wo_decoder setting is not the same as the architecture in paper. Instead, we regress the hand and face params based on the encoder feature.

@linjing7 Thanks! i just guessed but not for sure. And with your reply, i have a confidence to understand!