Change bert model to use `_fused_scale_mask_softmax` functions
loopinf opened this issue · 2 comments
loopinf commented
Describe a TODO feature
- Current implementation does not use kernel fusion function
oslo/transformers/models/bert/modeling_bert.py
- Need to change to use kernel fusion if possible.
Assignees
- loopinf
loopinf commented
what would be best way to check the improvement by using fused_scale_mask_softmax
?
Do I need to train some epoch to see actual difference? There is no test case to see the speed improvement in Megatron repo
Any good idea?
loopinf commented
what would be best way to check the improvement by using
fused_scale_mask_softmax
? Do I need to train some epoch to see actual difference? There is no test case to see the speed improvement in Megatron repo Any good idea?
I found benchmark functionality in transformers
. It would be good to implement those benchmark functionality.