Cannot reproduce the performance by ViT

Question

Cannot reproduce the performance by ViT

Opened this issue a year ago · 2 comments

jsw6872 commented a year ago

Hi, thank you for impressive research.

Your proposed model reproduces, but does not reproduce the performance of the experiments on ViT both MICO protocols.

just wondering if you were able to reproduce the ViT performance. TPR@FPR, which differs from the paper by about 2-30%.

Here is the result I got by running python train_vit.py --config M, C, I, O for each.

Can you give any suggestions for result?

Run, HTER, AUC, TPR@FPR=1%
0, 6.333333333333332, 98.43333333333332, 76.66666666666667
1, 9.75, 97.425, 63.33333333333333
2, 6.583333333333333, 96.64166666666667, 60.0
3, 5.0, 98.26666666666667, 68.33333333333333
4, 8.416666666666666, 95.35, 65.0
Mean,7.216666666666666, 97.22333333333333, 66.66666666666666
Std dev, 1.670495601776777, 1.134991434133119, 5.676462121975469

Run, HTER, AUC, TPR@FPR=1%
0, 13.426931056110385, 94.66557308502598, 55.319148936170215
1, 10.647947122111256, 96.7948408677892, 63.12056737588653
2, 9.258455155111688, 96.38153133593863, 53.90070921985816
3, 10.647947122111256, 95.23487882150496, 25.53191489361702
4, 9.258455155111688, 96.95726990559817, 60.99290780141844
Mean,10.647947122111255, 96.0068188031714, 51.773049645390074
Std dev, 1.522112187595772, 0.9010634790490837, 13.560758471168365

Run, HTER, AUC, TPR@FPR=1%
0, 12.385730211817169, 94.78595317725753, 24.615384615384617
1, 16.19286510590858, 91.21181716833891, 29.230769230769234
2, 14.626532887402455, 94.17614269788183, 34.61538461538461
3, 15.228539576365662, 92.93422519509475, 30.0
4, 13.87959866220736, 94.22742474916387, 38.46153846153847
Mean,14.462653288740245, 93.46711259754738, 31.384615384615387
Std dev, 1.2853508023630207, 1.2798800867634041, 4.751829295840155

Run, HTER, AUC, TPR@FPR=1%
0, 20.0, 87.72822299651567, 4.366197183098591
1, 20.42032683908328, 86.16719831182216, 2.3943661971830985
2, 20.509643225204886, 87.46181969867989, 14.225352112676056
3, 20.52559257986946, 88.02144574765667, 20.704225352112676
4, 19.73720371006527, 87.68459537714091, 22.3943661971831
Mean,20.23855327084458, 87.41265642636306, 12.816901408450704
Std dev, 0.3153353781341504, 0.6477253172896427, 8.197134698080816

Answer 1 · 2024-01-24T08:25:41.000Z

Hi
Thank you for your interest in our work and apologies for the delayed response.
I am glad to know that you were able to reproduce the results from our paper.

With regards to the ViT method, yes I was able to reproduce the results from the baseline paper and achieve results very close to the originally reported numbers.

From your results, I understand that you have been trying to reproduce the results for the 0-shot setting.
As mentioned by you, yeah these numbers of yours do differ a lot from the original.

One suggestion would be to verify your input samples and their pre-processing, which plays a major role.
Could you share a few of your pre-processed samples to get a better understanding? But given you are able to reproduce the results of our method, this would most likely not be the issue.

Also, consider playing around with the hyper-parameters.

Answer 2 · 2024-08-05T08:54:05.000Z

Hi Thank you for your interest in our work and apologies for the delayed response. I am glad to know that you were able to reproduce the results from our paper.

With regards to the ViT method, yes I was able to reproduce the results from the baseline paper and achieve results very close to the originally reported numbers.

From your results, I understand that you have been trying to reproduce the results for the 0-shot setting. As mentioned by you, yeah these numbers of yours do differ a lot from the original.

One suggestion would be to verify your input samples and their pre-processing, which plays a major role. Could you share a few of your pre-processed samples to get a better understanding? But given you are able to reproduce the results of our method, this would most likely not be the issue.

Also, consider playing around with the hyper-parameters.

Hello,thank you for impressive research. I encountered the same problem. These are some data that I have processed, and the data processing method used is mtcnn ,and change the face size to 224.chips = detector.extract_image_chips(img, points, 224, 0.37)
select frames by SSDG