ppo报错
yyy900 opened this issue · 3 comments
yyy900 commented
请问最后一步ppo的时候如下报错是什么原因?没有更改代码。
只有一张80G A100,如下(merged_sft_model_path不知道有没有问题):
CUDA_VISIBLE_DEVICES=0 python rl_training.py \
--base_model_name /home/wyang/code/LLM-Tuning/baichuan-inc/baichuan-7B \
--merged_sft_model_path /home/wyang/code/LLM-Tuning/baichuan-inc/baichuan-7B \
--sft_model_lora_path /home/wyang/code/LLM-Tuning/weights/hc3_chatgpt_zh_specific_qa_baichuan-7B/checkpoint-2000 \
--reward_model_lora_path /home/wyang/code/LLM-Tuning/weights/baichuan-7B_beyond_reward_chinese_-1/checkpoint-4550 \
--adafactor False \
--save_freq 10 \
--output_max_length 256 \
--batch_size 2 \
--gradient_accumulation_steps 16 \
--batched_gen True \
--ppo_epochs 1 \
--seed 0 \
--learning_rate 1e-5 \
--early_stopping True \
--output_dir weights/baichaun_rlhf_beyond_chinese_test_6 \
--log_with wandb
报错:
Using pad_token, but it is not set yet.
fatal: Not a git repository (or any of the parent directories): .git
Loading base model for ppo training...
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 3
wandb: You chose "Don't visualize my results"
wandb: Tracking run with wandb version 0.15.7
wandb: W&B syncing is set to `offline` in this directory.
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
Loading base model for reward model...
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Some weights of BaichuanForSequenceClassification were not initialized from the model checkpoint at /home/wyang/code/LLM-Tuning/baichuan-inc/baichuan-7B and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
{'': 0}
{'': 0}
0it [00:00, ?it/s]Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
---------------------
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
0
[tensor([31106, 31394, 77, 31604, 10857, 11748, 2852, 19463, 9945, 25618,
31763, 75, 5, 5, 31902, 77], device='cuda:0'), tensor([31106, 31394, 77, 8929, 2087, 8415, 31158, 32298, 31749, 31822,
32431, 72, 27001, 1224, 31973, 31757, 31373, 31357, 2855, 31779,
31799, 31135, 12105, 75, 5, 5, 31902, 77],
device='cuda:0')]
---------------------
0it [00:06, ?it/s]
wandb: Waiting for W&B process to finish... (success).
wandb: You can sync this run to the cloud by running:
beyondguo commented
我这里单卡跑的时候没问题。
你的报错是found at least two devices, cuda:0 and cpu!
你检查一下每个模型(ppo_model, ref_model, reward_model)的device,打印出来看看
yyy900 commented
怎么检查
ref_model = AutoModelForCausalLMWithValueHead.from_pretrained(
script_args.merged_sft_model_path,
trust_remote_code=True
)的device?
AttributeError: 'AutoModelForCausalLMWithValueHead' object has no attribute 'hf_device_map'
其他两个的都在 cuda 0
base_model: {'': 0}
reward_model: {'': 0}
beyondguo commented
打印 mode.device