RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 while running on an RTX 3060 12GB, using 8-bit.
ThatCoffeeGuy opened this issue · 7 comments
After loading the 8bit model I am facing the following issue:
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████| 3/3 [00:14<00:00, [28/1000$
Human: asd
/home/sadmin/miniconda3/envs/pa/lib/python3.10/site-packages/transformers/generation/utils.py:1201: UserWarning: You have modi
fied the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be
removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/m
ain_classes/text_generation)
warnings.warn(
Traceback (most recent call last):
File "/home/sadmin/point-alpaca/chat.py", line 102, in <module>
go()
File "/home/sadmin/point-alpaca/chat.py", line 72, in go
generated_ids = generator(
File "/home/sadmin/miniconda3/envs/pa/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_contex
t
return func(*args, **kwargs)
File "/home/sadmin/miniconda3/envs/pa/lib/python3.10/site-packages/transformers/generation/utils.py", line 1452, in generate
return self.sample(
File "/home/sadmin/miniconda3/envs/pa/lib/python3.10/site-packages/transformers/generation/utils.py", line 2504, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
What I tried so far:
quantization_config = BitsAndBytesConfig(
llm_int8_threshold=1.0,
)
as a variable, then
quantization_config=quantization_config
to model = transformers.LLaMAForCausalLM.from_pretrained([...]).cuda{}
Also tried to just passllm_int8_threshold=1.0
to the loader - both ways it loads the model, but at generation I have another error:
return self._apply(lambda t: t.cuda(device))
NotImplementedError: Cannot copy out of meta tensor; no data!
Hardware: RTX 3060 12GB, Ryzen 5700X, 24GB RAM
Seeing the same error on RTX3080 10GB, running on 8bit mode.
RuntimeError: probability tensor contains either
inf
,nan
or element < 0
same error on Tesla V100 32G
For my case it is environment issue.
I Create new vitual environment with python3.9 with conda, it works now.
Has anyone else been able to fix this issue? I checked and I'm on Python 3.9.
Changed from python 3.10 to python3.9 on RTX3080 10GB, but still hitting the error.
Having the same with RTX A4000:
$ python chat.py
Loading ./result...
gpu_count 1
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /opt/src/point-alpaca/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Loading checkpoint shards: 100%| 3/3 [00:32<00:00, 10.91s/it]
Human: Hello! I'm Alex. What's your name?
Traceback (most recent call last):
File "/opt/src/point-alpaca/chat.py", line 95, in <module>
go()
File "/opt/src/point-alpaca/chat.py", line 65, in go
generated_ids = generator(
File "/opt/src/point-alpaca/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/src/point-alpaca/venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1452, in generate
return self.sample(
File "/opt/src/point-alpaca/venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 2504, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
These are the changes that I made:
$ git diff
diff --git a/chat.py b/chat.py
index 56ac4e8..4a3a8e0 100644
--- a/chat.py
+++ b/chat.py
@@ -27,12 +27,12 @@ def load_model(model_name, eight_bit=0, device_map="auto"):
model = transformers.LLaMAForCausalLM.from_pretrained(
model_name,
#device_map=device_map,
- #device_map="auto",
+ device_map="auto",
torch_dtype=torch.float16,
#max_memory = {0: "14GB", 1: "14GB", 2: "14GB", 3: "14GB",4: "14GB",5: "14GB",6: "14GB",7: "14GB"},
#load_in_8bit=eight_bit,
low_cpu_mem_usage=True,
- load_in_8bit=False,
+ load_in_8bit=True,
cache_dir="cache"
).cuda()
load_in_8bit=True
according to the README, the second was explicitly demanded after setting the first one:
$ python chat.py
Loading ./result...
gpu_count 1
Traceback (most recent call last):
File "/opt/src/point-alpaca/chat.py", line 41, in <module>
load_model("./result")
File "/opt/src/point-alpaca/chat.py", line 27, in load_model
model = transformers.LLaMAForCausalLM.from_pretrained(
File "/opt/src/point-alpaca/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2151, in from_pretrained
raise ValueError(
ValueError: A device map needs to be passed to run convert models into mixed-int8 format. Please run`.from_pretrained` with `device_map='auto'`
I'm having this problem too. Not sure what's the problem. oobabooga/text-generation-webui works in 8bit with this model, but I'd rather not use it at the moment. I guess I'll have to go and see what they are doing differently...