koushiksrivats/FLIP

Cannot reproduce the performance by ViT

Opened this issue · 2 comments

Hi, thank you for impressive research.

Your proposed model reproduces, but does not reproduce the performance of the experiments on ViT both MICO protocols.

just wondering if you were able to reproduce the ViT performance. TPR@FPR, which differs from the paper by about 2-30%.

Here is the result I got by running python train_vit.py --config M, C, I, O for each.

Can you give any suggestions for result?

Run, HTER, AUC, TPR@FPR=1%
0, 6.333333333333332, 98.43333333333332, 76.66666666666667
1, 9.75, 97.425, 63.33333333333333
2, 6.583333333333333, 96.64166666666667, 60.0
3, 5.0, 98.26666666666667, 68.33333333333333
4, 8.416666666666666, 95.35, 65.0
Mean,7.216666666666666, 97.22333333333333, 66.66666666666666
Std dev, 1.670495601776777, 1.134991434133119, 5.676462121975469

Run, HTER, AUC, TPR@FPR=1%
0, 13.426931056110385, 94.66557308502598, 55.319148936170215
1, 10.647947122111256, 96.7948408677892, 63.12056737588653
2, 9.258455155111688, 96.38153133593863, 53.90070921985816
3, 10.647947122111256, 95.23487882150496, 25.53191489361702
4, 9.258455155111688, 96.95726990559817, 60.99290780141844
Mean,10.647947122111255, 96.0068188031714, 51.773049645390074
Std dev, 1.522112187595772, 0.9010634790490837, 13.560758471168365
Run, HTER, AUC, TPR@FPR=1%
0, 12.385730211817169, 94.78595317725753, 24.615384615384617
1, 16.19286510590858, 91.21181716833891, 29.230769230769234
2, 14.626532887402455, 94.17614269788183, 34.61538461538461
3, 15.228539576365662, 92.93422519509475, 30.0
4, 13.87959866220736, 94.22742474916387, 38.46153846153847
Mean,14.462653288740245, 93.46711259754738, 31.384615384615387
Std dev, 1.2853508023630207, 1.2798800867634041, 4.751829295840155
Run, HTER, AUC, TPR@FPR=1%
0, 20.0, 87.72822299651567, 4.366197183098591
1, 20.42032683908328, 86.16719831182216, 2.3943661971830985
2, 20.509643225204886, 87.46181969867989, 14.225352112676056
3, 20.52559257986946, 88.02144574765667, 20.704225352112676
4, 19.73720371006527, 87.68459537714091, 22.3943661971831
Mean,20.23855327084458, 87.41265642636306, 12.816901408450704
Std dev, 0.3153353781341504, 0.6477253172896427, 8.197134698080816

Hi
Thank you for your interest in our work and apologies for the delayed response.
I am glad to know that you were able to reproduce the results from our paper.

With regards to the ViT method, yes I was able to reproduce the results from the baseline paper and achieve results very close to the originally reported numbers.

From your results, I understand that you have been trying to reproduce the results for the 0-shot setting.
As mentioned by you, yeah these numbers of yours do differ a lot from the original.

One suggestion would be to verify your input samples and their pre-processing, which plays a major role.
Could you share a few of your pre-processed samples to get a better understanding? But given you are able to reproduce the results of our method, this would most likely not be the issue.

Also, consider playing around with the hyper-parameters.

Hi Thank you for your interest in our work and apologies for the delayed response. I am glad to know that you were able to reproduce the results from our paper.

With regards to the ViT method, yes I was able to reproduce the results from the baseline paper and achieve results very close to the originally reported numbers.

From your results, I understand that you have been trying to reproduce the results for the 0-shot setting. As mentioned by you, yeah these numbers of yours do differ a lot from the original.

One suggestion would be to verify your input samples and their pre-processing, which plays a major role. Could you share a few of your pre-processed samples to get a better understanding? But given you are able to reproduce the results of our method, this would most likely not be the issue.

Also, consider playing around with the hyper-parameters.

attack_client001_android_SD_iphone_video_scene01_frame1
attack_client001_laptop_SD_iphone_video_scene01_frame0
real_client003_android_SD_scene01_frame0
real_client005_laptop_SD_scene01_frame1
real_client007_android_SD_scene01_frame0
real_client007_laptop_SD_scene01_frame1

Hello,thank you for impressive research. I encountered the same problem. These are some data that I have processed, and the data processing method used is mtcnn ,and change the face size to 224.chips = detector.extract_image_chips(img, points, 224, 0.37)
select frames by SSDG