huggingface/transformers

RWKV split CPU & GPU results in high perplexity

3outeille opened this issue · 4 comments

System Info

Using #22797 (comment) PR, I tried to evaluate perplexity on wikitext2 using HuggingFace RWKV but found a weird behavior (gist to reproduce the bug: https://gist.github.com/3outeille/e74ec833ec2800a94325f8dad8e0da3d).

  • When model is fully loaded on CPU or GPU, perlexity is fine
  • When some block of RWKV are loaded in CPU and GPU, perplexity is high

Any idea ?

Who can help?

@sgugger, @younesbelkada

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

https://gist.github.com/3outeille/e74ec833ec2800a94325f8dad8e0da3d

Expected behavior

  • Full CPU ✔️ :
    • nlls: tensor([2.0129, 2.3220, 2.3500])
    • Perplexity: 9.284077644348145
  • Full GPU ✔️ :
    • nlls: tensor([2.0137, 2.3223, 2.3496], device='cuda:0', dtype=torch.float16)
    • Perplexity: 9.2890625
  • Split 🔴 :
    • nlls: tensor([15.6641, 15.9141, 16.5469], device='cuda:0', dtype=torch.float16)
    • Perplexity: 9312564.0

@younesbelkada Any update ?

Hi @3outeille
Sadly I didn't had time to check that out, are you still facing the issue with the latest main branch of transformers & accelerate?

Hi @younesbelkada, I update transformers & accelerate to the latest release version as shown here: https://github.com/3outeille/hf_rwkv_bug/blob/master/requirements.txt and the bug is still here

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.