Performance check
Flowerfan opened this issue ยท 7 comments
Hi, thank you for sharing the code and models.
I have used the ckpt_violet_pretrain.pt and ckpt_violet_msrvtt-retrieval with our data processing (5 frames with interval num_frames // 5) for msrvtt t2v retrieval evaluation.
I got rank@1 22.6/32.9 which is lower than the number (25.9/34.7) in the paper. I also tested the CLIP model and got a similar result. Are the released models achieving the reported results?
If yes, could you provide the processing pipeline or describe how to get the reported performance?
Thank you!
Yes, we equally sample 5 frames for each video using extract_video-frame.
I have re-tested and got 25.9/49.8 from ckpt_violet_pretrain.pt and 34.3/62.9 from ckpt_violet_msrvtt-retrieval.pt.
I am using PyTorch 1.7.0 and transformers 4.18.0 with CUDA 11.0.
Also, do not forget to add model.eval() during the evaluation.
Thank you for the re-testining. Could you provide me with the txt_msrvtt.json file that contains the 1k test videos? There are only 50 videos in https://github.com/tsujuifu/pytorch_violet/blob/main/_data/txt_msrvtt-retrieval.json
I just tested with my txt file, and got 'r@1': 0.233, 'r@5': 0.533 with the ckpt_violet_pretrain.pt. This is my generated txt file.
I just tested your file with the ckpt_violet_pretrain.pt using your repo, but still got r@1': 0.233, 'r@5': 0.533 ๐ . Have no idea what's wrong
Hi, Just wondering how you process the Youcook2 dataset for evaluation since one video contains multiple clip-text pairs. I have extracted clip-text pairs (3400) for evaluation and got a very disappointing performance.
I just tested your file with the ckpt_violet_pretrain.pt using your repo, but still got r@1': 0.233, 'r@5': 0.533 ๐ . Have no idea what's wrong
I just get the same result with u e.g. r@1': 0.233, 'r@5': 0.533.
Have u solve the problem?