SFT_of_an_LLM_using_Hugging_Face_tooling.ipynb problem with latest TRL 0.7.10
Analect opened this issue · 2 comments
@NielsRogge .. thanks for your SFT video. I was running through the notebook on a runpod RTX 4090
, per your demonstration.
In the Define SFTTrainer
step, the code was failing with ValueError: You passed a DataCollatorForCompletionOnlyLM to the SFTTrainer. This is not compatible with the packing argument.
, akin to this issue. I was on version 0.7.10.0
of TRL and ended up moving back to version 0.7.7.0
, when things worked again. It would be good to understand what needs to be changed in this code for it to work for the latest TRL.
Also, having saved the model and then tried to inference from it, I got this error on the final cell. I wasn't sure if that was related to me moving back a few versions in the TRL library to get the other part above working.
Thanks.
ValueError Traceback (most recent call last)
Cell In[33], line 6
3 output_dir = 'data/zephyr-7b-sft-lora'
5 tokenizer = AutoTokenizer.from_pretrained(output_dir)
----> 6 model = AutoModelForCausalLM.from_pretrained(output_dir, load_in_4bit=True, device_map="auto")
File /usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py:566, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
564 elif type(config) in cls._model_mapping.keys():
565 model_class = _get_model_class(config, cls._model_mapping)
--> 566 return model_class.from_pretrained(
567 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
568 )
569 raise ValueError(
570 f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
571 f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
572 )
File /usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:3792, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
3788 device_map_without_lm_head = {
3789 key: device_map[key] for key in device_map.keys() if key not in modules_to_not_convert
3790 }
3791 if "cpu" in device_map_without_lm_head.values() or "disk" in device_map_without_lm_head.values():
-> 3792 raise ValueError(
3793 """
...
these modules in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a custom
`device_map` to `from_pretrained`. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
for more details.
Hi,
Yes regarding the last error, I think that's fixed if you clear up your GPU memory (cause it was first occupied during training). So you might need to restart your notebook, and then run only the inference part, which will place the model on the GPU.
OK. Thanks for the tip. Any guidance on getting SFTTrainer
config working with latest trl
would be great also, if / when you get a chance.