Dao-AILab/flash-attention

How to use the _flash_attn_forward func

lll143653 opened this issue · 1 comments

Im a new user to flash attn and try to study it. When i use the flash_attn_forward with bias in flash-attention/flash_attn
/flash_attn_triton.py with

`B,H,M,K=4,32,32,128
q=torch.randn([B,H,M,K],dtype=torch.float16).cuda()
k=torch.randn([B,H,M,K],dtype=torch.float16).cuda()
v=torch.randn([B,H,M,K],dtype=torch.float16).cuda()
bias=torch.randn([B,M,H,H],dtype=torch.float32).cuda()

print(_flash_attn_forward(q,k,v,bias))`,
I meet the Segmentation fault (core dumped) error and i founded the error seems happe in
line 506
qk = qk * softmax_scale + bias.

What should I do to add bias to flash attn triton?

my GPU is nvidia A6000