kssteven418/LTP

Does attention mask reduce computation cost?

brotherb opened this issue · 1 comments

Hey, there.
After I read the code, I am confused that the computation cost can be reduced by mask more tokens. Did I miss anything?

PS. I see the FLOPS is calculated by the length of tokens retented at each layer which is counted during inference.

Do you have inference latency metrics?