Issue: Missing Generation of `pytorch_model.bin` File During Model Tuning
WilliamYi96 opened this issue · 5 comments
Thank you for sharing your interesting project!
Recently, when I ran bash ./script/llama_prune.sh
, the pruning step worked perfectly fine. However, during the tuning step, although there were no error information, the generated structure only included the following:
- checkpoints-200
- model.safetensors
- optimizer.pt
- rng_state.pth
- scheduler.pt
- trainer_state.json
- training_args.bin
I noticed that the pytorch_model.bin
file was not saved. I haven't modified the code, and I am using PyTorch version 2.1.2+cu121. Could you suggest what the possible reason for this might be?
Issue resolved. The reason lies in the newer versions of the transformers
library, where safetensors
has become the default format, replacing pytorch_model.bin
, starting from transformers>=4.33.0
. This issue can be addressed by either downgrading to transformers==4.33.0
using pip install transformers==4.33.0
, or by setting self_serialization=False
in model.save_pretrained()
.
Tracking here: huggingface/transformers#28183
Two updates:
pip install transformers==4.33.0
will lead to the following issue:
AttributeError: 'LlamaTokenizer' object has no attribute 'added_tokens_decoder'. Did you mean: '_added_tokens_decoder'?
- If using the latest transformers and setting
self_serialization=False
, there is still no pytorch_model.bin saved.
This issue still exists.
Issue resolved. The problem is that when constructing the trainer, save_safetensors=False
should be set. Otherwise, the above safe_serialization=False
will not work.
@WilliamYi96 Can we recover pytorch.bin from the safe tensor representation ? I have already run the finetuning on a bigger dataset for some time and want to avoid triggering the learning. or can we resume from the checkpoint and save after running for some steps?