convert_to_ds_params.py doesn't generate tokenizer
tammypi opened this issue · 3 comments
convert_to_ds_params.py only generates llama-7b folder and .pt files in it. But does not generate tokenizer.
But the param tokenizer_path of tokenize_dataset.py needs tokenizer.
So how can I get tokenizer?
You can download tokenizer from here. Besides, it also provides the model files after operating convert_to_ds_params.py
.
I had a similar issue as @tammypi when I tried to run finetune_pp_peft.py
. The script only generates .pt files (e.g. layer_00-model_states.pt). Therefore, when I run
python finetune_pp_peft.py --model_path ../llama-7b/
, it said no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory ../llama-7b/.
Alternatively, I could use src/transformers/models/llama/convert_llama_weights_to_hf.py
to convert the model into hf format and run finetune_pp_peft.py
without any problem. Do you think it's a good idea to use convert_llama_weights_to_hf.py
in transformers package instead of your file? What is the difference? Thanks!
Sorry for the mistake. I actually hope to mention convert_llama_weights_to_hf.py
in this project but add convert_to_ds_params.py
incorrectly. Thanks for your issue, I have fixed this bug.