Yxxxb/VoCo-LLaMA

Flash attention and attention mask modification. Does the model support flash attention?

Closed this issue · 1 comments

Dear authors,
first of all congrats for your idea and paper!!

I have a question about the code. I see here

if self._use_flash_attention_2:
that in flash attention you do not modify the attention mask. Is it expected?

thanks

Hi,

Thank you for your interest.
Since we need to use a 4d attention mask, but the open source flash attention only supports a 2d casual attention mask, we chose the standard sdpa and modified the attention mask based on that.