I have flash attention installed but I got the ImportError: Flash Attention 2.0 is not available.

Question

I have flash attention installed but I got the ImportError: Flash Attention 2.0 is not available.

luisegehaijing opened this issue 14 days ago · 1 comments

Can anyone help?

Answer 1 · 2024-06-19T01:24:45.000Z

One of the good things about OSS is ease of debugging - you can see what's going wrong by reading the library's source code!
In this case, the transformers library's is_flash_attn_2_available() is returning False:

def is_flash_attn_2_available():
    if not is_torch_available():
        return False

    if not _is_package_available("flash_attn"):
        return False

    # Let's add an extra check to see if cuda is available
    import torch

    if not torch.cuda.is_available():
        return False

    if torch.version.cuda:
        return version.parse(importlib.metadata.version("flash_attn")) >= version.parse("2.1.0")
    elif torch.version.hip:
        # TODO: Bump the requirement to 2.1.0 once released in https://github.com/ROCmSoftwarePlatform/flash-attention
        return version.parse(importlib.metadata.version("flash_attn")) >= version.parse("2.0.4")
    else:
        return False

So make sure that:

PyTorch is available (import torch)
Flash Attention is available (import flash_attn)
PyTorch can see CUDA (torch.cuda.is_available() == True). If it can't, check out this StackOverflow question
Check the version of flash_attn (flash_attn.__version__ should be >= 2.1.0, assuming you're using a NVIDIA GPU)