ziplab/LongVLM

Unable to Reproduce videochatgpt Benchmark Results

Opened this issue · 0 comments

vhzy commented

Hello,

Thank you for your open-source contribution. I have trained a model using the code you provided. However, I am seeing different results on the videochatgpt benchmark compared to what is reported in the paper. My scores across the five metrics are 2.72, 2.47, 3.11, 2.29, and 2.71, with an average of 2.66, which is different from the reported average of 2.89.
Considering that different versions of ChatGPT might affect the outcomes, could you please provide a pretrained model for testing? This would help verify the results. Thank you for your assistance.