ppo报错

Question

ppo报错

yyy900 opened this issue 2 years ago · 3 comments

请问最后一步ppo的时候如下报错是什么原因？没有更改代码。

只有一张80G A100，如下（merged_sft_model_path不知道有没有问题）：

CUDA_VISIBLE_DEVICES=0 python rl_training.py \
    --base_model_name /home/wyang/code/LLM-Tuning/baichuan-inc/baichuan-7B \
    --merged_sft_model_path /home/wyang/code/LLM-Tuning/baichuan-inc/baichuan-7B \
    --sft_model_lora_path /home/wyang/code/LLM-Tuning/weights/hc3_chatgpt_zh_specific_qa_baichuan-7B/checkpoint-2000 \
    --reward_model_lora_path /home/wyang/code/LLM-Tuning/weights/baichuan-7B_beyond_reward_chinese_-1/checkpoint-4550 \
    --adafactor False \
    --save_freq 10 \
    --output_max_length 256 \
    --batch_size 2 \
    --gradient_accumulation_steps 16 \
    --batched_gen True \
    --ppo_epochs 1 \
    --seed 0 \
    --learning_rate 1e-5 \
    --early_stopping True \
    --output_dir weights/baichaun_rlhf_beyond_chinese_test_6 \
    --log_with wandb

报错：

Using pad_token, but it is not set yet.
fatal: Not a git repository (or any of the parent directories): .git
Loading base model for ppo training...
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 3
wandb: You chose "Don't visualize my results"
wandb: Tracking run with wandb version 0.15.7
wandb: W&B syncing is set to `offline` in this directory.  
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
Loading base model for reward model...
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Some weights of BaichuanForSequenceClassification were not initialized from the model checkpoint at /home/wyang/code/LLM-Tuning/baichuan-inc/baichuan-7B and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
{'': 0}
{'': 0}
0it [00:00, ?it/s]Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
---------------------
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
0
[tensor([31106, 31394,    77, 31604, 10857, 11748,  2852, 19463,  9945, 25618,
        31763,    75,     5,     5, 31902,    77], device='cuda:0'), tensor([31106, 31394,    77,  8929,  2087,  8415, 31158, 32298, 31749, 31822,
        32431,    72, 27001,  1224, 31973, 31757, 31373, 31357,  2855, 31779,
        31799, 31135, 12105,    75,     5,     5, 31902,    77],
       device='cuda:0')]
---------------------
0it [00:06, ?it/s]
wandb: Waiting for W&B process to finish... (success).
wandb: You can sync this run to the cloud by running:

Answer 1 · 2023-07-27T12:29:08.000Z

我这里单卡跑的时候没问题。

你的报错是found at least two devices, cuda:0 and cpu!

你检查一下每个模型（ppo_model, ref_model, reward_model）的device，打印出来看看

Answer 2 · 2023-07-28T03:32:39.000Z

怎么检查
ref_model = AutoModelForCausalLMWithValueHead.from_pretrained(
script_args.merged_sft_model_path,
trust_remote_code=True
)的device？
AttributeError: 'AutoModelForCausalLMWithValueHead' object has no attribute 'hf_device_map'
其他两个的都在 cuda 0
base_model: {'': 0}
reward_model: {'': 0}

Answer 3 · 2023-07-31T12:28:10.000Z

打印 mode.device