WillSuen/LostGANs

How to get the scores?

ZejianLi opened this issue · 16 comments

Hi, may I know how to get the reported scores? I use the code mentioned in #3 and the provided pre-trained model but fail to get the same results. I may miss some details.

Could you describe whether you use get FID score between the validation images and the generated images according to the validation layouts, or the training images and the images generated from the training layouts? Are the images resized, whether 64x64 or 128x128, to 224x224 before the computation? In the training, only images with 3 to 8 objects are used. Is this selection rule applied in computing FID?

Also, in computing LIPIS, is the training set with layouts or validation set with layouts used? And for the classification accuracy, the generated images use the training or validation layouts?

I would be very pleased if you could explain these detailed problems or even share the code to reproduce the reported results. Looking forward to your reply.

Best,

Zejian

Hi, thank for your interests. For FID, I calculate the score between the validation image and generated results from validation layouts. The images are resized to 64x64 and 128x128 and same selection rule is applied. For COCO-Stuff we generate five images for each layout, for VG we generate one.

For LIPIS, validation set is used. For classification accuracy, we generate images on validation layouts, and test in real validation images.

Let me know if you have any further questions.

interests. For FID, I calculate the score between the validation image and generated results from validation layouts. The images are resized to 64x64 and 128x128 and same selection rule is applied. For COCO-Stuff we generate five images for each layout, for VG we generate one.

For LIPIS, validation set is used. For classification accuracy, we generate images on validation layouts, and test in

Thank you very much for your detailed reply. I will try again.

Hi, thank for your interests. For FID, I calculate the score between the validation image and generated results from validation layouts. The images are resized to 64x64 and 128x128 and same selection rule is applied. For COCO-Stuff we generate five images for each layout, for VG we generate one.

For LIPIS, validation set is used. For classification accuracy, we generate images on validation layouts, and test in real validation images.

Let me know if you have any further questions.

Hi, thank for your interests. For FID, I calculate the score between the validation image and generated results from validation layouts. The images are resized to 64x64 and 128x128 and same selection rule is applied. For COCO-Stuff we generate five images for each layout, for VG we generate one.

For LIPIS, validation set is used. For classification accuracy, we generate images on validation layouts, and test in real validation images.

Let me know if you have any further questions.

Hi, thanks for your work. I have downloaded your pre-trained model on COCO dataset. Following your instruction, I randomly generate five images for each layout (totally 3097 * 5). And using your cocostuff_loader, I got 3097 validation images for COCO validation set and resize all the images to the size of 128*128 (are the processing steps right?). Then I caculated the FID score between the generated images and validation ones (the FID code is following the link you give, tensorflow version: 1.14.0), but only got 39.23, which is different from 29.65 reported in your paper. Are there some wrong processes?
And for IS score, which pre-trained model do you use? Could you please provide the link? And what is the split number you set? I used the pre-trained model downloaded automatically by the code and only got 11.47 score, compared with 13.8 reported in your paper. The other parameters of the IS code remain the same.
It will be very appreciated, if you can could explain these detailed problems and help me reproduce the reported results. Looking forward to your reply. Thank you very much!

Hi,

For FID, the process steps seems same as what I did. Did you check the generated images and validation images? Did they look reasonable? I checked my setting, the tensorflow version is 1.12.0, but not sure will this make a difference.

For IS score, the code and pretrained model is from the official link. We followed sg2im paper, generate 5 images for each layout and use split=5 to get the IS. Is this setting same as yours?

Thanks

Hi,

For FID, the process steps seems same as what I did. Did you check the generated images and validation images? Did they look reasonable? I checked my setting, the tensorflow version is 1.12.0, but not sure will this make a difference.

For IS score, the code and pretrained model is from the official link. We followed sg2im paper, generate 5 images for each layout and use split=5 to get the IS. Is this setting same as yours?

Thanks

Hi, thanks for your reply! I have checked the sg2im paper, but got confused. In sg2im paper, he split coco-stuff validation to 1024 for val and 2048 for test, but didn't provide the detail split. Could you please provide the data split or the related link? (I have got 3097 coco-stuff validation images after the preprocessing, then how to split them to val and test sets?) Looking forward to your reply. Thank you very much!

Hi, sg2im did not provide the split. I just random shuffle and select 2048 (2048*5 total) for IS. For FID, I use all of them as the FID recommend using more samples.

Hi, sg2im did not provide the split. I just random shuffle and select 2048 (2048*5 total) for IS. For FID, I use all of them as the FID recommend using more samples.

Hi, I have checked every thing I can check, it's still far away from what you reported in your paper (11.6 vs 13.8 for IS and 38.1 vs 29..65 for FID). I have done the following processes:

  1. generate 5 images for each layout from the processed stuff-coco validation set (totally 3097 images). Code is as follows (this is the only part which I modified in your test.py):

for idx, data in enumerate(dataloader):
.....real_images, label, bbox = data
.....real_images, label, bbox = real_images.cuda(), label.long().unsqueeze(-1).cuda(), bbox.cuda()
.....for s_i in range(args.sample_num): # sample_num=5
..........z_obj = torch.from_numpy(truncted_random(num_o=num_o, thres=thres)).float().cuda()
..........z_im = torch.from_numpy(truncted_random(num_o=1, thres=thres)).view(1, -1).float().cuda()
..........fake_images = netG.forward(z_obj, bbox, z_im, label.squeeze(dim=-1))
..........misc.imsave("{save_path}/sample{s_i}_{idx}.jpg".format(save_path=args.sample_path, s_i=s_i, idx=idx), fake_images[0].cpu().detach().numpy().transpose(1, 2, 0)*0.5+0.5)
2. calculate IS score: randomly choose 2048*5 images (generated from 2048 layouts) to calculate IS score

path = args.path # test image path dir
for root, dirs, files in os.walk(path):
.....# randomly choose 2048*5 images (generated from 2048 layouts)
.....if args.random_select == 1:
..........new_files = []
..........select_index = random.sample(range(3097), 2048)
..........# img_name: sample_i_xxxx.jpg, xxxx presents the serial number of the layout generated from, 0<= i <=5
..........for img_name in files:
...............layout_id = int(img_name.split('_')[-1].split('.')[0])
...............if layout_id in select_index:
....................new_files.append(img_name)
..........files = new_files
.....random.shuffle(files)
.....images = []
.....for img_name in files:
..........img_path = os.path.join(root, img_name)
..........img = Image.open(img_path)
..........img = np.array(img)
..........images.append(img)
.....is_score = get_inception_score(images, 5) # split = 5
.....print('IS score', is_score)
3. calculate FID score: compute the FID score between generated images (3097 * 5) and the processed coco-stuff validation set (3097 * 5, here I repeat each image for 5 times).

Are these processes the same with yours? Could you please provide me your code for calculating IS and FID score, if convenient? It's really really ... really appreciated that you answer me the above questions! Thank you very much!

Hi, the code for IS seems good to me. For FID, I did not repeat real images for 5 times (just 5 for generated images). I have attached the generated image I got by running the code, could you please test on it? For IS, you can test on all images (I got 13.96 on all images, and from my experiment, there is not a big difference between 20485 and 30975). And is it possible for you to share your generated images then I can try to check what's wrong?

Hi, the code for IS seems good to me. For FID, I did not repeat real images for 5 times (just 5 for generated images). I have attached the generated image I got by running the code, could you please test on it? For IS, you can test on all images (I got 13.96 on all images, and from my experiment, there is not a big difference between 2048_5 and 3097_5). And is it possible for you to share your generated images then I can try to check what's wrong?

Hi, I have generated the images from pre-trained model G_coco.pth again and the scores doesn't change (for IS, it is about 11.5). I have uploaded the the generated images. Is G_coco.pth the right version you want to upload? Could you please share the 64 * 64 pre-trained model, the generated and real images? I got the real images by add a line code in your test.py:
misc.imsave("{save_path}_real/sample_{idx}.jpg".format(save_path=args.sample_path, idx=idx), real_images[0].cpu().detach().numpy().transpose(1, 2, 0)*0.5+0.5)
(change fake_image to real_image).
But for 64 * 64 real images, I only got about 13.5 IS score, which is also far away from 16.3 in the paper. How do you resize the real images? Looking forward to your reply. Thank you very much!

Hi, I checked the generated images by you and find that there is some discontinuity for the corner and edges.
comp(left is generated by you, and right is mine.)

I run the code with pytorch version 1.4.0 and 1.0.0, and find that 1.4.0 will have similar issue as yours. Could you please change the torch vision to 1.0.0 and have a try? (They might change some behavior for upsample or something else, I will try to solve it)

For 64x64 real images, I directly use the IS score from sg2im paper. I got real images same as you do, and will also get ~13 for 64x64 images, but I think this is not a big issue.

Hi, I checked the generated images by you and find that there is some discontinuity for the corner and edges.
comp(left is generated by you, and right is mine.)

I run the code with pytorch version 1.4.0 and 1.0.0, and find that 1.4.0 will have similar issue as yours. Could you please change the torch vision to 1.0.0 and have a try? (They might change some behavior for upsample or something else, I will try to solve it)

For 64_64 real images, I directly use the IS score from sg2im paper. I got real images same as you do, and will also get ~13 for 64_64 images, but I think this is not a big issue.

Thanks for your reply! I have checked that my pytorch vision is 1.0.0. Is there any possible reason causing this problem? Will it be caused by scipy (because the code uses scipy to save image)?

I don't think it is the scipy issue as the images generated by you seems good at the center but weird at edges. But you can have a try if you want (my scipy version is 1.2.0)

I don't think it is the scipy issue as the images generated by you seems good at the center but weird at edges. But you can have a try if you want (my scipy version is 1.2.0)

Yeah, my scipy version is 1.2.1. I will re-install pytorch and have a try. Thank you very much! If you find any thing else which could cause the problem, please let me know, thanks!

I have solved the problem. Thanks!

Hi, @JohnDreamer could you kindly elaborate on how you solved the problem? Thanks!

Hi, @JohnDreamer could you kindly elaborate on how you solved the problem? Thanks!

Hi, I have just found that I modify some parameters to remove the warnings. So I re-downloaded it and re-run. Everything is fine!