Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
okoliechykwuka opened this issue · 1 comments
okoliechykwuka commented
Thanks for making this code available on GitHub.
Running this section of your code during inference I am getting the below error.
#change peft_model_id
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
load_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
peft_model_id = "chukypedro/falcon7b-gpt_model"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
return_dict=True,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
tokenizer.pad_token = tokenizer.eos_token
model = PeftModel.from_pretrained(model, peft_model_id)
Error Traceback
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <cell line: 11>:11 │
│ │
│ /usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py:485 in │
│ from_pretrained │
│ │
│ 482 │ │ │ │ class_ref, pretrained_model_name_or_path, **hub_kwargs, **kwargs │
│ 483 │ │ │ ) │
│ 484 │ │ │ _ = hub_kwargs.pop("code_revision", None) │
│ ❱ 485 │ │ │ return model_class.from_pretrained( │
│ 486 │ │ │ │ pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, │
│ 487 │ │ │ ) │
│ 488 │ │ elif type(config) in cls._model_mapping.keys(): │
│ │
│ /usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:2815 in from_pretrained │
│ │
│ 2812 │ │ │ │ │ key: device_map[key] for key in device_map.keys() if key not in modu │
│ 2813 │ │ │ │ } │
│ 2814 │ │ │ │ if "cpu" in device_map_without_lm_head.values() or "disk" in device_map_ │
│ ❱ 2815 │ │ │ │ │ raise ValueError( │
│ 2816 │ │ │ │ │ │ """ │
│ 2817 │ │ │ │ │ │ Some modules are dispatched on the CPU or the disk. Make sure yo │
│ 2818 │ │ │ │ │ │ the quantized model. If you want to dispatch the model on the CP │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError:
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to
fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a
custom
`device_map` to `from_pretrained`. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-
cpu-and-gpu
for more details.
okoliechykwuka commented
I discovered I wasn't connected to GPU