Dao-AILab/flash-attention

How to use flash attention for inference speed boosting in bert like models

pradeepdev-1995 opened this issue · 1 comments

is flash attention support bert like models(bert,distilbert,roberta...etc) for reducing inference time latency?

If it supports, how to use flash attention for inference speed boosting in bert like models?
share the sample code

Please search for "bert" in the repo. If you don't put in effort I don't think you'll get help.