lucidrains/En-transformer

On rotary embeddings

chaitjo opened this issue · 1 comments

Hi @lucidrains, thank you for your amazing work; big fan! I had a quick question on the usage of this repository.

Based on my understanding, rotary embeddings are a drop-in replacement for the original sinusoidal or learnt PEs in Transformers for sequential data, as in NLP or other temporal applications. If my application is not on sequential data, is there a reason why I should still use rotary embeddings?

E.g. for molecular datasets such as QM9 (from the En-GNNs paper), would it make sense to have rotary embeddings?

Hi there!

I think in principle there's no reason to use them, although if you try, pls report your results.

Best,
Eric