pkunlp-icler/FastV

What change did you make at generation time to make "use_cache=False" valid?

Evil-Genius-HY opened this issue · 4 comments

I just find that llama will automatically set "use_cache=True" when you use "inputs_embeds" as model inputs which is exactly what llava use at generation time, did you make any special changes to make "use_cache= False" valid during generation time?

Hi, we add "use_cache=False" in the generation config without changing anything special for it.

Hello, thanks for your reply. I just found that a genration function has been added in the llava model in the latest version which will make the setting "use_cache=False" invalid during generation. The inputs of the generation function will be "inputs_embeds" which will make the genration function of transformers automatically set "use_cache=True" before the generation process. Therefore, anyone trying to use the latest llava model to reproduce your results will come across this problem.

I think maybe you are right if the change is made to the latest official LLaVA repo. But for me, "input_embeds" is not conflict with "use_cache=False". If you want to set "use_cache=False", just use the LLaVA code from this repo instead of the latest one. If you want to set "use_cache=True", use our code that is implemented with huggingface-llava.