Comparison with updated Based

Question

Comparison with updated Based

obv-mikhail opened this issue 8 months ago · 2 comments

Based architecture seems to have been updated - https://arxiv.org/abs/2402.18668. Any insights into how it compares with Rebased?

Answer 1 · 2024-03-10T21:07:47.000Z

From this point, the updated arxiv version of Based is more like subsequent research on subquadratic architectures rather than a simple upgrade. This new version introduces combined linear and sliding window attention, which is orthogonal to selecting a linear attention kernel studied with our paper. Right now, we do not have evaluations of a rebased kernel combined with sliding window attention.

Answer 2 · 2024-03-13T15:28:42.000Z

Hi, I've just finished training the small 124M model, and it seems that replacing conv1d with sliding window attention is orthogonal to the Based/ReBased performance, as we achieve slightly better loss value. We will update our preprint and we have plans to release training pipeline and weights. Stay tuned!