DAMO-NLP-SG/VCD

unable to reproducing the results of llava

frankRenlf opened this issue · 10 comments

I've followed the process, but I'm three points under. What other details do I need to pay attention to in order to reproduce the results of the paper?

Hi, may I know more details about your experiments so that we can help you better?

the data of std in table like (+-0.42), what will cause the result fluctuating? For the seed, results maintain the same.
And I found the results of regular produced by the code in your repo is higher than the results in your paper.

The fluctuating may come from the random sampling process, depending on the sampling strategies. Moreover, the process of adding random gaussian noise may also introduce randomness.

For the regular decoding, are u using a newer version of LLaVA?

thx for your reply. Yes, I just use llava7b.
image
And I want to know how's the std comes from.

Hi, the decoding strategy for the main table is always direct sampling without any constraints (i.e., top p, temperature normalization). The std may come from the sampling process or the randomness while adding the Gaussian noise to obtain a noised image.

Hi, but for regular, it also has std in results, what causes this? When I set seed, the results will not change. So the std may come from sampling process and the difference is got by difference seeds?
image

Yep, correct. When the seed is different, the sampling process during decoding and the process of adding Gaussian noises to the original image would both introduce randomness that may cause the std.

Hi, I still don't know which version of llava to use to reproduce the results. Can you share the link of Huggingface?

https://huggingface.co/liuhaotian/llava-v1.5-7b

Hi, Sicong, I have a problem that why the result report in the paper is much lower than the performance report in the original LLAVA 1.5 paper?

For example, in the orginal LLAVA-1.5 paper, it report the POPE F1 score in three split( Ran, Adv and Pop): 87.3 86.1 84.2

But in your paper, the result is 81.33, 77.57 and 80.06.