Change bert model to use `_fused_scale_mask_softmax` functions

Question

Change bert model to use `_fused_scale_mask_softmax` functions

loopinf opened this issue 2 years ago · 2 comments

loopinf commented 2 years ago

Describe a TODO feature

Current implementation does not use kernel fusion function oslo/transformers/models/bert/modeling_bert.py
Need to change to use kernel fusion if possible.

Assignees

loopinf

Answer 1 · 2022-10-02T09:51:03.000Z

what would be best way to check the improvement by using fused_scale_mask_softmax ?
Do I need to train some epoch to see actual difference? There is no test case to see the speed improvement in Megatron repo
Any good idea?

Answer 2 · 2022-10-06T11:42:07.000Z

what would be best way to check the improvement by using fused_scale_mask_softmax ? Do I need to train some epoch to see actual difference? There is no test case to see the speed improvement in Megatron repo Any good idea?

I found benchmark functionality in transformers. It would be good to implement those benchmark functionality.