EleutherAI/oslo

Change bert model to use `_fused_scale_mask_softmax` functions

loopinf opened this issue · 2 comments

Describe a TODO feature

  • Current implementation does not use kernel fusion function oslo/transformers/models/bert/modeling_bert.py
  • Need to change to use kernel fusion if possible.

Assignees

  • loopinf

what would be best way to check the improvement by using fused_scale_mask_softmax ?
Do I need to train some epoch to see actual difference? There is no test case to see the speed improvement in Megatron repo
Any good idea?

what would be best way to check the improvement by using fused_scale_mask_softmax ? Do I need to train some epoch to see actual difference? There is no test case to see the speed improvement in Megatron repo Any good idea?

I found benchmark functionality in transformers. It would be good to implement those benchmark functionality.