yxchng opened this issue 2 months ago · 1 comments
I have been experimenting with RWKV v4 and v4neo but somehow it is using much more memory (about 2x) than my LM that uses Flash Attention. Not sure what I am doing wrong. Is this expected?
Try v5 first. What's your model size, bsz, ctxlen