lucidrains/x-transformers

[Question] Why is RotaryEmbedding not used when cross attending?

pfeatherstone opened this issue · 1 comments

Why is RotaryEmbedding not used when cross attending?

if exists(rotary_pos_emb) and not has_context:
freqs, xpos_scale = rotary_pos_emb
q_xpos_scale, k_xpos_scale = (xpos_scale, xpos_scale ** -1.) if exists(xpos_scale) else (1., 1.)
q = apply_rotary_pos_emb(q, freqs, q_xpos_scale)
k = apply_rotary_pos_emb(k, freqs, k_xpos_scale)
if self.rotary_embed_values:
v = apply_rotary_pos_emb(v, freqs, k_xpos_scale)

Hey, maybe related to this: #38. Maybe it should raise an error instead of disabling it.