How to use the _flash_attn_forward func
lll143653 opened this issue · 1 comments
lll143653 commented
Im a new user to flash attn and try to study it. When i use the flash_attn_forward with bias in flash-attention/flash_attn
/flash_attn_triton.py with
`B,H,M,K=4,32,32,128
q=torch.randn([B,H,M,K],dtype=torch.float16).cuda()
k=torch.randn([B,H,M,K],dtype=torch.float16).cuda()
v=torch.randn([B,H,M,K],dtype=torch.float16).cuda()
bias=torch.randn([B,M,H,H],dtype=torch.float32).cuda()
print(_flash_attn_forward(q,k,v,bias))`,
I meet the Segmentation fault (core dumped) error and i founded the error seems happe in
line 506
qk = qk * softmax_scale + bias.
What should I do to add bias to flash attn triton?
lll143653 commented
my GPU is nvidia A6000