Does attention mask reduce computation cost？

Question

brotherb opened this issue a year ago · 1 comments

Hey, there.
After I read the code, I am confused that the computation cost can be reduced by mask more tokens. Did I miss anything?

PS. I see the FLOPS is calculated by the length of tokens retented at each layer which is counted during inference.

Do you have inference latency metrics?