Dao-AILab/flash-attention

I have flash attention installed but I got the ImportError: Flash Attention 2.0 is not available.

luisegehaijing opened this issue · 1 comments

Screenshot 2024-06-13 at 4 03 55 PM
Can anyone help?

One of the good things about OSS is ease of debugging - you can see what's going wrong by reading the library's source code!
In this case, the transformers library's is_flash_attn_2_available() is returning False:

def is_flash_attn_2_available():
    if not is_torch_available():
        return False

    if not _is_package_available("flash_attn"):
        return False

    # Let's add an extra check to see if cuda is available
    import torch

    if not torch.cuda.is_available():
        return False

    if torch.version.cuda:
        return version.parse(importlib.metadata.version("flash_attn")) >= version.parse("2.1.0")
    elif torch.version.hip:
        # TODO: Bump the requirement to 2.1.0 once released in https://github.com/ROCmSoftwarePlatform/flash-attention
        return version.parse(importlib.metadata.version("flash_attn")) >= version.parse("2.0.4")
    else:
        return False

So make sure that:

  • PyTorch is available (import torch)
  • Flash Attention is available (import flash_attn)
  • PyTorch can see CUDA (torch.cuda.is_available() == True). If it can't, check out this StackOverflow question
  • Check the version of flash_attn (flash_attn.__version__ should be >= 2.1.0, assuming you're using a NVIDIA GPU)