Cannot get same evaluation results as the paper.

Question

Cannot get same evaluation results as the paper.

Opened this issue a year ago · 1 comments

I use the released checkpoint models_0420000.pt and official inference code on DeepFashion dataset.
According to the paper I got 50k inference results. Then I calculate the FlD and KID between results and dataset.

python generation_demo.py --batch 1 --chunk 1 \
    --expname 512x256_deepfashion --dataset_path ./dataset/DeepFashion \
    --depth 5 --width 128 --style_dim 128 --renderer_spatial_output_dim 512 256 \
    --input_ch_views 3 --white_bg \
    --voxhuman_name eva3d_deepfashion \
    --deltasdf --N_samples 28 --ckpt 420000 \
    --identities 50000  #--render_video

For the evaluation code, I use torch-fidelity package:

fidelity --gpu 0 --kid --fid --input1 ${my_path} --input2 ${dataset_path}

But I only got FID=55, which is much worse than it in the paper.
Am I doing wrong?

Answer 1 · 2023-06-17T08:27:01.000Z

Hi. Thank you for the question. Sorry for the late reply and possible confusion.

The checkpoint we release is not the highest performance one. You may use this link to download the checkpoint that we use to obtain the around 15 FID.

Please make sure that you download deepfashion using our link and use the 8037 images listed in the train_list.txt. And we pad all the images from 512x256 to 512x512 using white color. No truncation is used when generating fake samples. And when generating fake samples, do not use pose-guided sampling. Just uniformly sample SMPL parameters from the 8037 samples.

We prepare the download link to the 8037 padded real images here and the 50k fake samples here which are generated using the checkpoint above. Both are ready for fid calculation. The result should be 15.89.