Yxxxb/VoCo-LLaMA

Flash attention and attention mask modification. Does the model support flash attention?

Closed this issue 5 months ago · 1 comments

denadai2 commented 5 months ago

Dear authors,
first of all congrats for your idea and paper!!

I have a question about the code. I see here

VoCo-LLaMA/llava/model/language_model/llava_llama_1stg.py

Line 263 in 79859d0

if self._use_flash_attention_2:

that in flash attention you do not modify the attention mask. Is it expected?

thanks

Yxxxb commented 5 months ago

Hi,

Thank you for your interest.
Since we need to use a 4d attention mask, but the open source flash attention only supports a 2d casual attention mask, we chose the standard sdpa and modified the attention mask based on that.