convert_to_ds_params.py doesn't generate tokenizer

Question

convert_to_ds_params.py doesn't generate tokenizer

tammypi opened this issue 2 years ago · 3 comments

convert_to_ds_params.py only generates llama-7b folder and .pt files in it. But does not generate tokenizer.
But the param tokenizer_path of tokenize_dataset.py needs tokenizer.
So how can I get tokenizer?

Answer 1 · 2023-06-26T10:11:23.000Z

You can download tokenizer from here. Besides, it also provides the model files after operating convert_to_ds_params.py.

Answer 2 · 2023-06-30T20:55:51.000Z

I had a similar issue as @tammypi when I tried to run finetune_pp_peft.py. The script only generates .pt files (e.g. layer_00-model_states.pt). Therefore, when I run
python finetune_pp_peft.py --model_path ../llama-7b/, it said no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory ../llama-7b/.

Alternatively, I could use src/transformers/models/llama/convert_llama_weights_to_hf.py to convert the model into hf format and run finetune_pp_peft.py without any problem. Do you think it's a good idea to use convert_llama_weights_to_hf.py in transformers package instead of your file? What is the difference? Thanks!

Answer 3 · 2023-07-05T08:44:51.000Z

Sorry for the mistake. I actually hope to mention convert_llama_weights_to_hf.py in this project but add convert_to_ds_params.py incorrectly. Thanks for your issue, I have fixed this bug.