You mean phi3 surpassed mistral7B?

Question

You mean phi3 surpassed mistral7B?

MonolithFoundation opened this issue 5 months ago · 5 comments

MonolithFoundation commented 5 months ago

I think it really really out of expect, how will a phi3 model surpass mistaral7B, in the case of VideChat2 using a gaint vision encoder?
Which part could be really work one?

Answer 1 · 2024-06-14T14:30:39.000Z

Hi @MonolithFoundation,

I appreciate your interest in our work. As per VideoChat2 paper, they have reported an average of 60.4 on MVBench with Mistral-7B LLM. In our case, VideoGPT+ obtains 58.7 average score on MVBench with Phi-3-mini-3.8B LLM.

We have released all the model checkpoints, training, and evaluation codes to reproduce our reported results. I hope this will help.

Please let me know if you have any questions. Thank You.

Answer 2 · 2024-06-15T14:08:05.000Z

can eight v100 GPUs train the model?

Answer 3 · 2024-06-16T06:37:41.000Z

@mmaaz60 From the first picture, VideoGpT+ surpassed VideChat2 with a clearly margin, but VideChat2 with mistral actually got better result as for now.

Current days Video MLLMs actually didn't really care about which LLM size they using...

Answer 4 · 2024-06-16T06:42:46.000Z

can eight v100 GPUs train the model?

Hi @zimenglan-sysu-512

I appreciate your interest in our work. As we are using Phi-3-Mini with 3.8B model as LLM, the model can be trained easily on 8 V100 GPUs with 32GB memory per GPU. However, we have to turn off the flash attention as it is not supported for V100 GPUs.

I hope it will help. Good Luck! And please let me know if you face any issues.

Answer 5 · 2024-06-16T07:14:10.000Z

@mmaaz60 From the first picture, VideoGpT+ surpassed VideChat2 with a clearly margin, but VideChat2 with mistral actually got better result as for now.

Current days Video MLLMs actually didn't really care about which LLM size they using...

Hi @lucasjinreal

Thank you for your interest in our work. VideoGPT+ is using Phi-3-mini LLM with only 3.8B parameters, and is relatively weaker as compared to Mistral-7B.

On the other hand, if we compare the Vicuna 7B based models for both VideoGPT+ and VideoChat2, we noticed that VideoChat2 obtains 51.1 on average on MVBench, and our Vicuna 7B based variant obtains 53.1 average score.

Further, there are gains in VCGBench and VCGBench-Diverse evaluations as well.

We acknowledge that VideoChat2 is a strong video conversation model, however, our VideoGPT+ obtains better results on multiple benchmarks as discussed in our technical report and all the codes to reproduce our reported results are released on the GitHub.