model will broken when i start pretraining

Question

model will broken when i start pretraining

Abolfazl-kr opened this issue 7 months ago · 3 comments

Abolfazl-kr commented 7 months ago

Check before submitting issues

Make sure to pull the latest code, as some issues and bugs have been fixed.
I have read the Wiki and FAQ section AND searched for similar issues and did not find a similar problem or solution
Third-party plugin issues - e.g., llama.cpp, LangChain, text-generation-webui, we recommend checking the corresponding project for solutions

Type of Issue

Model training and fine-tuning

Base Model

Chinese-LLaMA-2 (7B/13B)

Operating System

Linux

Describe your issue in detail

when i start pre training, the model seems to be broken.
I feed the model the minimum data (less than 1 MB) after that the model cannot generate English sentence.
I use your Chinese-LLaMA-Alpaca repo to create my tokenizer.

I would be so appreciated if you can help me.

Dependencies (must be provided for code-related issues)

# 运行脚本前请仔细阅读wiki(https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/pt_scripts_zh)
# Read the wiki(https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/pt_scripts_zh) carefully before running the script
lr=2e-5
lora_rank=64
lora_alpha=128
lora_trainable="q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
modules_to_save="embed_tokens,lm_head"
lora_dropout=0.05

pretrained_model=...
chinese_tokenizer_path=...
dataset_dir=...
data_cache=...
per_device_train_batch_size=1
gradient_accumulation_steps=1
block_size=64

output_dir=...

deepspeed_config_file=ds_zero2_no_offload.json

CUDA_VISIBLE_DEVICES=0,1 torchrun --nnodes 1 --nproc_per_node 2 run_clm_pt_with_peft.py \
    --deepspeed ${deepspeed_config_file} \
    --model_name_or_path ${pretrained_model} \
    --tokenizer_name_or_path ${chinese_tokenizer_path} \
    --dataset_dir ${dataset_dir} \
    --data_cache_dir ${data_cache} \
    --validation_split_percentage 0.001 \
    --per_device_train_batch_size ${per_device_train_batch_size} \
    --do_train \
    --seed $RANDOM \
    --num_train_epochs 1 \
    --lr_scheduler_type cosine \
    --learning_rate ${lr} \
    --warmup_ratio 0.05 \
    --weight_decay 0.01 \
    --logging_strategy steps \
    --logging_steps 10 \
    --save_strategy steps \
    --save_total_limit 3 \
    --save_steps 200 \
    --gradient_accumulation_steps ${gradient_accumulation_steps} \
    --preprocessing_num_workers 6 \
    --block_size ${block_size} \
    --output_dir ${output_dir} \
    --overwrite_output_dir \
    --ddp_timeout 30000 \
    --logging_first_step True \
    --lora_rank ${lora_rank} \
    --lora_alpha ${lora_alpha} \
    --trainable ${lora_trainable} \
    --lora_dropout ${lora_dropout} \
    --modules_to_save ${modules_to_save} \
    --torch_dtype float32 \
    --load_in_kbits 8 \
    --save_safetensors False \
    --gradient_checkpointing \
    --ddp_find_unused_parameters False

Name: peft
Version: 0.3.0
Name: transformers
Version: 4.35.0

Execution logs or screenshots

# Please copy-and-paste your logs here.

Answer 1 · 2024-02-19T22:04:38.000Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

Answer 2 · 2024-02-20T07:48:15.000Z

Both data and hyperparameter will affect the final effect of the model. There may be cases of overfitting, try increasing total_batch_size or observe the effect of the intermediate model to locate the problem, good luck.

Answer 3 · 2024-03-06T22:04:20.000Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.