BlinkDL/RWKV-LM

Can RWKV beat Flash Attention?

yxchng opened this issue · 1 comments

I have been experimenting with RWKV v4 and v4neo but somehow it is using much more memory (about 2x) than my LM that uses Flash Attention. Not sure what I am doing wrong. Is this expected?

Try v5 first. What's your model size, bsz, ctxlen