pointnetwork/point-alpaca

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 while running on an RTX 3060 12GB, using 8-bit.

ThatCoffeeGuy opened this issue · 7 comments

After loading the 8bit model I am facing the following issue:

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████| 3/3 [00:14<00:00,  [28/1000$
Human: asd

/home/sadmin/miniconda3/envs/pa/lib/python3.10/site-packages/transformers/generation/utils.py:1201: UserWarning: You have modi
fied the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be
 removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/m
ain_classes/text_generation)
  warnings.warn(
Traceback (most recent call last):
  File "/home/sadmin/point-alpaca/chat.py", line 102, in <module>
    go()
  File "/home/sadmin/point-alpaca/chat.py", line 72, in go
    generated_ids = generator(
  File "/home/sadmin/miniconda3/envs/pa/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_contex
t
    return func(*args, **kwargs)
  File "/home/sadmin/miniconda3/envs/pa/lib/python3.10/site-packages/transformers/generation/utils.py", line 1452, in generate
    return self.sample(
  File "/home/sadmin/miniconda3/envs/pa/lib/python3.10/site-packages/transformers/generation/utils.py", line 2504, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

What I tried so far:

quantization_config = BitsAndBytesConfig(
    llm_int8_threshold=1.0,
)

as a variable, then

quantization_config=quantization_config to model = transformers.LLaMAForCausalLM.from_pretrained([...]).cuda{}
Also tried to just passllm_int8_threshold=1.0to the loader - both ways it loads the model, but at generation I have another error:

    return self._apply(lambda t: t.cuda(device))
NotImplementedError: Cannot copy out of meta tensor; no data!

Hardware: RTX 3060 12GB, Ryzen 5700X, 24GB RAM

Seeing the same error on RTX3080 10GB, running on 8bit mode.

RuntimeError: probability tensor contains either inf, nan or element < 0

same error on Tesla V100 32G

For my case it is environment issue.
I Create new vitual environment with python3.9 with conda, it works now.

Has anyone else been able to fix this issue? I checked and I'm on Python 3.9.

Changed from python 3.10 to python3.9 on RTX3080 10GB, but still hitting the error.

Having the same with RTX A4000:

$ python chat.py 
Loading ./result...
gpu_count 1

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /opt/src/point-alpaca/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Loading checkpoint shards: 100%| 3/3 [00:32<00:00, 10.91s/it]
Human: Hello! I'm Alex. What's your name?

Traceback (most recent call last):
  File "/opt/src/point-alpaca/chat.py", line 95, in <module>
    go()
  File "/opt/src/point-alpaca/chat.py", line 65, in go
    generated_ids = generator(
  File "/opt/src/point-alpaca/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/src/point-alpaca/venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1452, in generate
    return self.sample(
  File "/opt/src/point-alpaca/venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 2504, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

These are the changes that I made:

$ git diff
diff --git a/chat.py b/chat.py
index 56ac4e8..4a3a8e0 100644
--- a/chat.py
+++ b/chat.py
@@ -27,12 +27,12 @@ def load_model(model_name, eight_bit=0, device_map="auto"):
     model = transformers.LLaMAForCausalLM.from_pretrained(
         model_name,
         #device_map=device_map,
-        #device_map="auto",
+        device_map="auto",
         torch_dtype=torch.float16,
         #max_memory = {0: "14GB", 1: "14GB", 2: "14GB", 3: "14GB",4: "14GB",5: "14GB",6: "14GB",7: "14GB"},
         #load_in_8bit=eight_bit,
         low_cpu_mem_usage=True,
-        load_in_8bit=False,
+        load_in_8bit=True,
         cache_dir="cache"
     ).cuda()

load_in_8bit=True according to the README, the second was explicitly demanded after setting the first one:

$ python chat.py 
Loading ./result...
gpu_count 1
Traceback (most recent call last):
  File "/opt/src/point-alpaca/chat.py", line 41, in <module>
    load_model("./result")
  File "/opt/src/point-alpaca/chat.py", line 27, in load_model
    model = transformers.LLaMAForCausalLM.from_pretrained(
  File "/opt/src/point-alpaca/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2151, in from_pretrained
    raise ValueError(
ValueError: A device map needs to be passed to run convert models into mixed-int8 format. Please run`.from_pretrained` with `device_map='auto'`
sswam commented

I'm having this problem too. Not sure what's the problem. oobabooga/text-generation-webui works in 8bit with this model, but I'd rather not use it at the moment. I guess I'll have to go and see what they are doing differently...