You mean phi3 surpassed mistral7B?
MonolithFoundation opened this issue · 5 comments
I think it really really out of expect, how will a phi3 model surpass mistaral7B, in the case of VideChat2 using a gaint vision encoder?
Which part could be really work one?
I appreciate your interest in our work. As per VideoChat2 paper, they have reported an average of 60.4 on MVBench with Mistral-7B LLM. In our case, VideoGPT+ obtains 58.7 average score on MVBench with Phi-3-mini-3.8B LLM.
We have released all the model checkpoints, training, and evaluation codes to reproduce our reported results. I hope this will help.
Please let me know if you have any questions. Thank You.
can eight v100 GPUs train the model?
@mmaaz60 From the first picture, VideoGpT+ surpassed VideChat2 with a clearly margin, but VideChat2 with mistral actually got better result as for now.
Current days Video MLLMs actually didn't really care about which LLM size they using...
can eight v100 GPUs train the model?
I appreciate your interest in our work. As we are using Phi-3-Mini with 3.8B model as LLM, the model can be trained easily on 8 V100 GPUs with 32GB memory per GPU. However, we have to turn off the flash attention as it is not supported for V100 GPUs.
I hope it will help. Good Luck! And please let me know if you face any issues.
@mmaaz60 From the first picture, VideoGpT+ surpassed VideChat2 with a clearly margin, but VideChat2 with mistral actually got better result as for now.
Current days Video MLLMs actually didn't really care about which LLM size they using...
Thank you for your interest in our work. VideoGPT+ is using Phi-3-mini LLM with only 3.8B parameters, and is relatively weaker as compared to Mistral-7B.
On the other hand, if we compare the Vicuna 7B based models for both VideoGPT+ and VideoChat2, we noticed that VideoChat2 obtains 51.1 on average on MVBench, and our Vicuna 7B based variant obtains 53.1 average score.
Further, there are gains in VCGBench and VCGBench-Diverse evaluations as well.
We acknowledge that VideoChat2 is a strong video conversation model, however, our VideoGPT+ obtains better results on multiple benchmarks as discussed in our technical report and all the codes to reproduce our reported results are released on the GitHub.