labmlai/annotated_deep_learning_paper_implementations

question about RotaryPEMultiHeadAttention: rotary_percentage

YOONSEOKHEO opened this issue · 0 comments

I confirmed that there is code in the RotaryPEMultiHeadAttention class that reduces the dimension using a parameter called rope_percentage.
(URL:

)

I am curious in what cases you would set rope_percentage to a value less than 1.

(Of course, in experiment.py, we confirmed that rope_percentage is set to 1.0.)