On rotary embeddings
chaitjo opened this issue · 1 comments
chaitjo commented
Hi @lucidrains, thank you for your amazing work; big fan! I had a quick question on the usage of this repository.
Based on my understanding, rotary embeddings are a drop-in replacement for the original sinusoidal or learnt PEs in Transformers for sequential data, as in NLP or other temporal applications. If my application is not on sequential data, is there a reason why I should still use rotary embeddings?
E.g. for molecular datasets such as QM9 (from the En-GNNs paper), would it make sense to have rotary embeddings?
hypnopump commented
Hi there!
I think in principle there's no reason to use them, although if you try, pls report your results.
Best,
Eric