2d rope ablation?
Closed this issue · 1 comments
orrzohar commented
Hi,
Did you ever ablate the use of the 2d RoPE embeddings vs the normal 1d RoPE/Positional embeddings?
Best,
Orr
chrisc36 commented
We didn't do an ablation at scale, we did find in preliminary experiments that the 2d RoPE embeddings helped reduce the loss and made the model more stable.