microsoft/DeepSpeedExamples

Step3 PPO print error when enable --print_answers

tonylin52 opened this issue · 1 comments

Hi,
I am in PPO training phase (step 3) with llama2-13b model. However, I get unexcepted print answers in 'generate_experience' function as following image. May I ask how should I solve this problem?

image

The answer of the model is good with original 'transformers'. Here is my config file:
deepspeed -H=./hostfile main.py
--data_path Fin-Contrastive/data
--data_split 0,0,10
--actor_model_name_or_path /llama2_13b_1107
--critic_model_name_or_path /step2_reward_model_finetuning/output/epoch_0
--num_padding_at_beginning 0
--per_device_generation_batch_size 1
--per_device_training_batch_size 1
--generation_batches 1
--ppo_epochs 1
--max_answer_seq_len 512
--max_prompt_seq_len 512
--actor_learning_rate ${Actor_Lr}
--critic_learning_rate ${Critic_Lr}
--actor_weight_decay 0.1
--critic_weight_decay 0.1
--num_train_epochs 1
--lr_scheduler_type cosine
--gradient_accumulation_steps 1
--num_warmup_steps 100
--deepspeed --seed 1234
--enable_hybrid_engine
--inference_tp_size 1
--tp_gather_partition_size 4
--actor_zero_stage 3
--critic_zero_stage 3
--actor_gradient_checkpointing
--actor_dropout 0.0
--print_answers
--output_dir $OUTPUT
--data_output_path $DATAOUTPUT
&> $OUTPUT/training.log

I found the key problem, it is ‘enable_hybrid_engine’. When I disabel 'hybrid_engine' every thing goes fine.