ankanbhunia/PIDM

Quantitative Problem

muzishen opened this issue · 26 comments

real_path = './deepfashion/train_256x256_pngs/'

Thank you very much for open-sourcing your articles.
Could you please clarify what these three files represent and how they were obtained?

And during the dataset preparation, I didn't find the step mentioned 'cd scripts ./download_dataset.sh 'in the readme, which is why I directly downloaded the preprocessed data. Can you help me? Thank you.

Here the real_path contains all the training images resized to 256x256 dimension. The individual filename remains the same. Please note while creating this folder - 1. resize using Image.BICUBIC in the PIL library and 2. save images in .png (lossless format)

The folder structure:

train_256x256_pngs/
     deepfashionMENDenimid0000008001_1front.png
     deepfashionMENDenimid0000008001_2side.png
     .....

Hi, I am a beginner.
My understanding is that real_path represents all training sets, while gt_path represents all testing sets.
So, what does distorated_path refer to?

It refers to the generated images. You can download our results (PIDM.zip) to undestand how this folder should look like.

It refers to the generated images. You can download our results (PIDM.zip) to undestand how this folder should look like.

Thank you.
I apologize for the interruption again.

I don't understand why FID (Fréchet Inception Distance) is computed using the training set and generated images instead of the test set and generated images.

It's a common practice to measure the FID between the training set and generated set when quantifying image synthesis problems. FID doesn't tell you how well the model can reconstruct the images, for that you need to use LPIPS and SSIM. What FID really does is measure how well the model can generate images that closely match the quality of the ones used for training.

Thank you very much.

It's a common practice to measure the FID between the training set and generated set when quantifying image synthesis problems. FID doesn't tell you how well the model can reconstruct the images, for that you need to use LPIPS and SSIM. What FID really does is measure how well the model can generate images that closely match the quality of the ones used for training.

I apologize for disturbing you again.

  1. According to the synthesized results based on the PIDM you provided, it appears that the FID results cannot be aligned.
    The results are as follows:
    图片

It seems a bit strange with the FID results: I first resized it to (256, 256) using PIL with BICUBIC interpolation.

  1. Additionally, how can I generate images for PIDM.zip? Could you please provide a script if possible?

I have recomputed the score in my machine as shown below.

Screenshot 2023-06-15 at 1 02 07 PM

Can you tell me how many images you have inside the real_path folder and also can you copy-paste the code you have used to create the real_path folder?

图片

real_path: 37016

Can you check the train.lst file. there are around ~48K training images.

But as I understand it, the training was done using train_pairs.txt, which consists of 37,016 unique images after removing duplicates.
In theory, shouldn't we calculate the FID using train_pairs.txt as well?
If I have misunderstood, please correct me. Thank you.

Yes, that is true. But we evaluated it on the entire training set, similar to baselines. 

Also, I wanted to understand why you are getting such a low FID. So, could you please try it on the 48K images and let me know if you are able to get a similar FID?

Sure, I will give it a try and let you know the results as soon as possible.

Yes, that is true. But we evaluated it on the entire training set, similar to baselines.

Also, I wanted to understand why you are getting such a low FID. So, could you please try it on the 48K images and let me know if you are able to get a similar FID?

I have created the training images from train_pairs.txt, removing all duplicated ones, resulting in 37016 images in "train_256x256_pngs" folder. I had tried and got the same problematic score of FID as @muzishen:
image

Next, I tried 48674 images as the author suggested (train.lst), and got the FID = 6.94, not really same as paper. I save images as jpg with:

  • image = Image.open(source_real_filename)
    
  • new_image = image.resize((256, 256), resample=PIL.Image.BICUBIC)
    
  • new_image.save(real_filename)
    

The generated output images I got from PIDM.zip provided by the author. Here is my results :(

image

Could you confirm the exact number of training images in your PC is 48,674, too?

I'm trying again with "png" and update soon. [updated below comments]

@trungpx Please try with the ".png" format as it's considered as a lossless format. jpeg compression can result a significant difference as compared to .png (https://github.com/GaParmar/clean-fid)

@ankanbhunia thanks for the useful repo. Finally, I have successfully replicated the results.
image

Regrading the question from @muzishen I am also concerning about it: "how can I generate images for PIDM.zip? Could you please provide a script if possible?", it is nice if it can be suggested so that I can make some baselines to my research. Thank you.

Thanks for the confirmation.
Please check the gen.py inside the utils folder.
Best of luck!

@trungpx Hello, I have confirmed that there are 48,674 images.
They are in PNG format.
The results are as follows: FID: 6.3798; SPIPS: 0.1678; SSIM: 0.7320.

@ankanbhunia @muzishen @trungpx
Thanks for your questions, I try to reproduce the experimental results but fail to it. I would like to ask you if my procedures are correct.

  1. Split the deepfashion/img dataset to train(48674) and test(4038) folder with .jpg format.
  2. Crop images in PIL center crop with code img.crop((40, 0, 216, 256)) and then use PIL to resize images with code image.resize((256, 256), resample=PIL.Image.BICUBIC) for train and test sets.
  3. The real_path is 48674 train images produced by previous steps, gt_path is 4038 test images and distorated_path is PIDM images(8570).

But my result is FID: 8.7594, LPIPS: 0.2020, ssim_256: 0.714411.

Do you know which step I did was not correct?
Thanks for your help

@ankanbhunia @muzishen @trungpx Thanks for your questions, I try to reproduce the experimental results but fail to it. I would like to ask you if my procedures are correct.

  1. Split the deepfashion/img dataset to train(48674) and test(4038) folder with .jpg format.
  2. Crop images in PIL center crop with code img.crop((40, 0, 216, 256)) and then use PIL to resize images with code image.resize((256, 256), resample=PIL.Image.BICUBIC) for train and test sets.
  3. The real_path is 48674 train images produced by previous steps, gt_path is 4038 test images and distorated_path is PIDM images(8570).

But my result is FID: 8.7594, LPIPS: 0.2020, ssim_256: 0.714411.

Do you know which step I did was not correct? Thanks for your help

The second step might be problem, dont use any crop, just resize it as you did. Try and let me know if it fixed.

@trungpx Thanks for your replying. Based on your suggestion, I just use PIL to resize images(256, 256) with code image.resize((256, 256), resample=PIL.Image.BICUBIC), but the result lpips: 0.339 which was worse than original lpips result 0.1678.

I don't know which step I did was wrong.

Could you try again with the following:

  1. split train (48674 with train.lst) and test (3144 with test_pair.txt), open .jpg and resize and save to .png, example as below in my case:
    from PIL import Image
    ...
    source_real_filename = './data/deepfashion/img/' + file
    image = Image.open(source_real_filename)
    new_image = image.resize((256, 256), resample=PIL.Image.BICUBIC)
    real_filename = real_path + s1.replace('.jpg','') + '.png'
    new_image.save(real_filename)
  2. use the following list for test images:
    filenames_test = []
    file_txt_test = '{}/test_pairs.txt'.format(root)
    with open(file_txt_test, 'r') as f:
    lines = f.readlines()
    for item in lines:
    filenames_test.extend(item.strip().split(','))
    filenames_test = list(set(filenames_test)) # len = 3144
    And then open with PIL for these jpg test img and resize, save to png.

root2 = './data/deepfashion/'
for files in tqdm(filenames_test):
...# process strings to get real file name of original img jpg and open it, then save to png with PIL
image = Image.open(source_real_filename)
new_image = image.resize((256, 256), resample=PIL.Image.BICUBIC)
new_image.save(real_filename)

8570 img for distorted folder (output_images_pngs) (PIDM.zip).
image

It should solve the problem.

image

@trungpx Thanks for your replying. I have reproduced the results as you did before. The reason of wrong result is that I used the 256*256 img.zip dataset from DeepFashion instead of img_highres.zip. Thanks for your sincere help again.

Hi to all,
I have a question regarding the number of training pairs. In the paper it is mentioned "Following the same data configuration in [30], we split this dataset into training and testing subsets with 101,966 and 8,570 pairs". However, the train_pairs.txt contains ~37K pairs as you mention. Where does this discrepancy of the number of pairs come from?
Thank you!

Hi to all, I have a question regarding the number of training pairs. In the paper it is mentioned "Following the same data configuration in [30], we split this dataset into training and testing subsets with 101,966 and 8,570 pairs". However, the train_pairs.txt contains ~37K pairs as you mention. Where does this discrepancy of the number of pairs come from? Thank you!

Hello,

You can see in the content of the "train_pairs.txt" that each line contains multiple file names separated by a comma, which means that 37,016 lines are not only 37k pairs. But the total should be around 101,966 pairs. Hope it is clear.

image