tloen/llama-int8

Getting error on generation in Windows

elephantpanda opened this issue · 4 comments

I installed bitsandbytes following the guide for windows
including the dll from here.

Everything works find it loads 7B into about 8GB VRAM. Great.

But in generating I get:

  File "example.py", line 103, in main
    results = generator.generate(
  File "C:\Users\Shadow\Documents\LLama\llama-int8-main\llama\generation.py", line 60, in generate
    next_token = torch.multinomial(
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Any ideas what went wrong?

same question

So am i, did you fix this?

I reported an error in testing on tesla p40, but it ran successfully on rtx a5000. Maybe it is because of the low computing power of the graphics card?

thanks!!