the value of loss is too unstable when supervised-finetune the 7b-100k-ft model
seanxuu opened this issue · 1 comments
seanxuu commented
when I use the LongAlpaca-12k dataset to supervised fintune the LongAlpaca-7B model, the value of loss is too unstable.
my command is :
Miniconda/envs/longlora/bin/python -u supervised-fine-tune.py
--model_name_or_path models/LongAlpaca-7B
--bf16 True
--output_dir LongLoRA/save/LongAlpaca-7B-origdata
--model_max_length 32768
--use_flash_attn True
--data_path data/LongAlpaca-12k.json
--low_rank_training True
--num_train_epochs 3
--per_device_train_batch_size 1
--per_device_eval_batch_size 2
--gradient_accumulation_steps 1
--evaluation_strategy no
--save_strategy steps
--save_steps 1000
--save_total_limit 2
--learning_rate 2e-5
--weight_decay 0.0
--warmup_steps 20
--lr_scheduler_type constant_with_warmup
--logging_steps 1
--deepspeed ds_configs/stage2.json
--tf32 True
the value of loss looks like below:
seanxuu commented
I try to train Llama-2-7b-longlora-100k-ft with my own dataset which is sampled from your LongAlpaca-12k.json data. But the value of loss looks same.
python supervised-fine-tune.py \
--model_name_or_path /models/Llama-2-7b-longlora-100k-ft \
--bf16 True \
--output_dir LongLoRA/save/7b-100k-ft-origdata-mydata \
--model_max_length 100000 \
--use_flash_attn True \
--data_path LongLoRA/pdf2txt/output/manual_data.json \
--low_rank_training True \
--num_train_epochs 5 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 2 \
--gradient_accumulation_steps 8 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 98 \
--save_total_limit 2 \
--learning_rate 2e-5 \
--weight_decay 0.0 \
--warmup_steps 20 \
--lr_scheduler_type "constant_with_warmup" \
--logging_steps 1 \
--deepspeed "ds_configs/stage2.json" \
--tf32 True