SFT_of_an_LLM_using_Hugging_Face_tooling.ipynb problem with latest TRL 0.7.10

Question

SFT_of_an_LLM_using_Hugging_Face_tooling.ipynb problem with latest TRL 0.7.10

Analect opened this issue 5 months ago · 2 comments

@NielsRogge .. thanks for your SFT video. I was running through the notebook on a runpod RTX 4090, per your demonstration.

In the Define SFTTrainer step, the code was failing with ValueError: You passed a DataCollatorForCompletionOnlyLM to the SFTTrainer. This is not compatible with the packing argument., akin to this issue. I was on version 0.7.10.0 of TRL and ended up moving back to version 0.7.7.0, when things worked again. It would be good to understand what needs to be changed in this code for it to work for the latest TRL.

Also, having saved the model and then tried to inference from it, I got this error on the final cell. I wasn't sure if that was related to me moving back a few versions in the TRL library to get the other part above working.

Thanks.

ValueError                                Traceback (most recent call last)
Cell In[33], line 6
      3 output_dir = 'data/zephyr-7b-sft-lora'
      5 tokenizer = AutoTokenizer.from_pretrained(output_dir)
----> 6 model = AutoModelForCausalLM.from_pretrained(output_dir, load_in_4bit=True, device_map="auto")

File /usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py:566, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    564 elif type(config) in cls._model_mapping.keys():
    565     model_class = _get_model_class(config, cls._model_mapping)
--> 566     return model_class.from_pretrained(
    567         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    568     )
    569 raise ValueError(
    570     f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
    571     f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
    572 )

File /usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:3792, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
   3788         device_map_without_lm_head = {
   3789             key: device_map[key] for key in device_map.keys() if key not in modules_to_not_convert
   3790         }
   3791         if "cpu" in device_map_without_lm_head.values() or "disk" in device_map_without_lm_head.values():
-> 3792             raise ValueError(
   3793                 """
...
                        these modules in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a custom
                        `device_map` to `from_pretrained`. Check
                        https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
                        for more details.

Answer 1 · 2024-01-22T15:10:24.000Z

Hi,

Yes regarding the last error, I think that's fixed if you clear up your GPU memory (cause it was first occupied during training). So you might need to restart your notebook, and then run only the inference part, which will place the model on the GPU.

Answer 2 · 2024-01-22T15:50:57.000Z

OK. Thanks for the tip. Any guidance on getting SFTTrainer config working with latest trl would be great also, if / when you get a chance.