RWKV split CPU & GPU results in high perplexity
3outeille opened this issue · 4 comments
System Info
Using #22797 (comment) PR, I tried to evaluate perplexity on wikitext2 using HuggingFace RWKV but found a weird behavior (gist to reproduce the bug: https://gist.github.com/3outeille/e74ec833ec2800a94325f8dad8e0da3d).
- When model is fully loaded on CPU or GPU, perlexity is fine
- When some block of RWKV are loaded in CPU and GPU, perplexity is high
Any idea ?
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
https://gist.github.com/3outeille/e74ec833ec2800a94325f8dad8e0da3d
Expected behavior
- Full CPU ✔️ :
nlls: tensor([2.0129, 2.3220, 2.3500])
Perplexity: 9.284077644348145
- Full GPU ✔️ :
nlls: tensor([2.0137, 2.3223, 2.3496], device='cuda:0', dtype=torch.float16)
Perplexity: 9.2890625
- Split 🔴 :
nlls: tensor([15.6641, 15.9141, 16.5469], device='cuda:0', dtype=torch.float16)
Perplexity: 9312564.0
@younesbelkada Any update ?
Hi @3outeille
Sadly I didn't had time to check that out, are you still facing the issue with the latest main branch of transformers & accelerate?
Hi @younesbelkada, I update transformers & accelerate to the latest release version as shown here: https://github.com/3outeille/hf_rwkv_bug/blob/master/requirements.txt and the bug is still here
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.