NisaarAgharia/Indian-LawyerGPT

Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit

okoliechykwuka opened this issue · 1 comments

Thanks for making this code available on GitHub.

Running this section of your code during inference I am getting the below error.

#change peft_model_id
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    load_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

peft_model_id = "chukypedro/falcon7b-gpt_model"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    return_dict=True,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
tokenizer.pad_token = tokenizer.eos_token
model = PeftModel.from_pretrained(model, peft_model_id)

Error Traceback

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <cell line: 11>:11                                                                            │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py:485 in          │
│ from_pretrained                                                                                  │
│                                                                                                  │
│   482 │   │   │   │   class_ref, pretrained_model_name_or_path, **hub_kwargs, **kwargs           │
│   483 │   │   │   )                                                                              │
│   484 │   │   │   _ = hub_kwargs.pop("code_revision", None)                                      │
│ ❱ 485 │   │   │   return model_class.from_pretrained(                                            │
│   486 │   │   │   │   pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs,   │
│   487 │   │   │   )                                                                              │
│   488 │   │   elif type(config) in cls._model_mapping.keys():                                    │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:2815 in from_pretrained   │
│                                                                                                  │
│   2812 │   │   │   │   │   key: device_map[key] for key in device_map.keys() if key not in modu  │
│   2813 │   │   │   │   }                                                                         │
│   2814 │   │   │   │   if "cpu" in device_map_without_lm_head.values() or "disk" in device_map_  │
│ ❱ 2815 │   │   │   │   │   raise ValueError(                                                     │
│   2816 │   │   │   │   │   │   """                                                               │
│   2817 │   │   │   │   │   │   Some modules are dispatched on the CPU or the disk. Make sure yo  │
│   2818 │   │   │   │   │   │   the quantized model. If you want to dispatch the model on the CP  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: 
                        Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to 
fit
                        the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
                        these modules in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a 
custom
                        `device_map` to `from_pretrained`. Check
                        https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-
cpu-and-gpu
                        for more details.

I discovered I wasn't connected to GPU