Bad performance of Charades

Question

Bad performance of Charades

soyeonhong opened this issue 10 months ago · 1 comments

I reproduced using Charades dataset based on the checkpoint given in the repo, and the result was 27.9 for R@1 (IoU = 0.5) and 12.3 for R@1 (IoU = 0.7).
However, according to the results of the paper, R@1 (IoU = 0.5) should be 32.2, and R@1 (IoU = 0.7) should be 13.4.
In my results, R@1 (IoU = 0.5) is particularly low. If it is this low, can you tell me what parameters or methods I need to change?

Answer 1 · 2024-03-14T07:29:19.000Z

Hi, thanks for your interest.

Our released ckpt is different from the version used in the paper. The released ckpt was trained after cleaning the code and fixing a minor bug in QuerYD instructions data (some videos have the same start and end timestamps in the raw annotations file, so we only use one timestamp in the revision).
In our evaluation, the performance of the released ckpt on YouCook2 is higher than that in the paper, while the performance on Charades-STS & QVHighlight is lower. We also note that the output generated by LLM is different each time, which may cause fluctuations in the evaluation results.

We have uploaded the ckpt used in our paper, please refer to https://huggingface.co/ShuhuaiRen/TimeChat-7b-paper. With this ckpt, I believe you can reproduce the results in our paper.