Confused about the llama-pro demo. Why `num_layers` 49 should be divisible by `num_layer_trainable` 2.
hzgdeerHo opened this issue · 2 comments
Reminder
- I have read the README and searched the existing issues.
Reproduction
By Using the LLama-pro example script to finetune the 01-ai/Yi-1.5-9B-Chat model:
Modified expand.sh:
python scripts/llama_pro.py
--model_name_or_path 01-ai/Yi-1.5-9B-Chat
--output_dir models/01-ai/Yi-1.5-9B-Chat
--num_expand 2
Modified /examples/extras/llama_pro/llama3_freeze_sft.yaml:
model
model_name_or_path: models/01-ai/Yi-1.5-9B-Chat
method
stage: sft
do_train: true
finetuning_type: freeze
freeze_trainable_layers: 2
freeze_trainable_modules: all
use_llama_pro: true
dataset
dataset: identity
template: yi
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
output
output_dir: saves/Yi-1.5-9B-Chat/freeze/sft
logging_steps: 1
save_steps: 500
plot_loss: true
overwrite_output_dir: true
train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 0.00005
num_train_epochs: 2
lr_scheduler_type: cosine
warmup_steps: 0.1
fp16: true
eval
val_size: 0.1
per_device_eval_batch_size: 1
evaluation_strategy: steps
eval_steps: 500
And I got the error message:
Traceback (most recent call last):
File "/home/ubuntu/python3.9/bin/llamafactory-cli", line 8, in
sys.exit(main())
File "/home/ubuntu/LLaMA-Factory/src/llamafactory/cli.py", line 65, in main
run_exp()
File "/home/ubuntu/LLaMA-Factory/src/llamafactory/train/tuner.py", line 34, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/home/ubuntu/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 34, in run_sft
model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)
File "/home/ubuntu/LLaMA-Factory/src/llamafactory/model/loader.py", line 144, in load_model
model = init_adapter(config, model, model_args, finetuning_args, is_trainable)
File "/home/ubuntu/LLaMA-Factory/src/llamafactory/model/adapter.py", line 73, in init_adapter
raise ValueError(
ValueError: num_layers
49 should be divisible by num_layer_trainable
2.
Expected behavior
I will work normally as we targeted to finetune the expanded 2 blocks.
System Info
No response
Others
No response
If the original model has 47 layers, and I expanded it to 49 layers, why the script need to ensure that 49 can be divided by 2 ?
It appears that the original model has 48 layers https://huggingface.co/01-ai/Yi-1.5-9B-Chat/blob/main/config.json#L16