How to use flash attention for inference speed boosting in bert like models
pradeepdev-1995 opened this issue · 1 comments
pradeepdev-1995 commented
is flash attention support bert like models(bert,distilbert,roberta...etc) for reducing inference time latency?
If it supports, how to use flash attention for inference speed boosting in bert like models?
share the sample code
tridao commented
Please search for "bert" in the repo. If you don't put in effort I don't think you'll get help.