[BUG] The text cannot be generated successfully during the Raft step
Opened this issue · 1 comments
biaoliu-kiritsugu commented
Describe the bug
When I use the fine-tuned LLAMA3 model to run the examples/raft_align.py
script, I encountered the following error:
Traceback (most recent call last):
File "/home/work/user-job-dir/app/liubiao/llm/LMflow/examples/raft_align.py", line 220, in <module>
main()
File "/home/work/user-job-dir/app/liubiao/llm/LMflow/examples/raft_align.py", line 183, in main
outputs = model.generate(**inputs, **generation_kwargs)
File "/home/naie/.local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/naie/.local/lib/python3.9/site-packages/transformers/generation/utils.py", line 1758, in generate
result = self._sample(
File "/home/naie/.local/lib/python3.9/site-packages/transformers/generation/utils.py", line 2397, in _sample
outputs = self(
File "/home/naie/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/naie/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/naie/.local/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1164, in forward
outputs = self.model(
File "/home/naie/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/naie/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/naie/.local/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 968, in forward
layer_outputs = decoder_layer(
File "/home/naie/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/naie/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/naie/.local/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 713, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/naie/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/naie/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/naie/.local/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 331, in forward
query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
RuntimeError: shape '[2, 206, 32, 128]' is invalid for input of size 412
This is my running script:
accelerate launch --main_process_port 8836 \
--config_file configs/deepspeed_zeor3.yaml --num_processes 1 \
examples/raft_align.py \
--model_name_or_path ${model_name_or_path} \
--reward_model_or_path ${reward_model_or_path} \
--tokenizer_name ${tokenizer_name} \
--num_raft_iteration 20 \
--learning_rate 2e-5 \
--block_size 512 \
--fp16 \
--dataset_path ${dataset_path} \
--output_reward_path log/raft_aligner/reward.txt \
--output_dir ${output_dir} --overwrite_output_dir \
--run_name "${exp_id}_${timestamp}" \
--num_train_epochs 4 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--validation_split_percentage 0 \
--logging_steps 1 \
--do_train \
--ddp_timeout 72000 \
--save_steps 7777 \
--dataloader_num_workers 1 \
--preprocessing_num_workers 12 \
--inference_batch_size_per_device 1 \
--collection_strategy "local" \
--raft_batch_size 1024 \
--output_min_length 96 \
--output_max_length 512 \
--top_reward_percentage 0.125
However, when I use the following test script, the text is generated successfully during the generate step without any errors:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
model_name = "/home/work/user-job-dir/app/liubiao/huggingface/merge_instruct_llama3_sft"
tokenizer_name = "/home/naie/work/liubiao/huggingface/Meta-Llama-3-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
device = "npu"
model.to(device)
input_texts = ["<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nShould you buy a case to protect your cell phone?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nIt depends on your circumstances. If you carry your phone in a pocket or a purse then you probably want a case. But if you only need a phone for quick interactions, a case may actually cause more harm than good. What do you need the phone for? Are you a parent, or do you work from home?<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nWhat harm could it do?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nA phone case can damage the screen, for one thing. It can also get you in trouble if you have your phone turned off for some reason. Then you will turn it back on and it won’t do anything. If you can afford to replace it, then you need a case to protect it. The problem is that most people aren’t able to afford to replace their phones all the time.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nThanks for letting me know.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"] * 2
# generation_kwargs = {
# "max_new_tokens": 50
# }
stop_token = "<|eot_id|>"
stop_token_id = tokenizer.encode(stop_token)[0]
# tokenizer.add_special_tokens({"eos_token": "<|eot_id|>"})
# print(tokenizer.eos_token)
generation_kwargs = {
"max_new_tokens": 96,
"min_length": 1,
"top_k": 0.0,
"top_p": 1.0,
"do_sample": True,
"pad_token_id": tokenizer.eos_token_id,
"eos_token_id": stop_token_id,
"temperature":0.85,
"repetition_penalty": 1.2
}
tokenizer.pad_token = tokenizer.eos_token
inputs = tokenizer(input_texts, return_tensors="pt", padding=True).to(device)
print("Input IDs size:", inputs["input_ids"].size())
with torch.no_grad():
outputs = model.generate(**inputs, **generation_kwargs)
print("Generated Outputs size:", outputs.size())
outputs = outputs.cpu()
generated_texts = tokenizer.batch_decode(outputs, skip_special_tokens=True)
for i, text in enumerate(generated_texts):
print(f"Generated text {i+1}: {text}")
Expected behavior
Text is generated successfully during the Raft step.
biaoliu-kiritsugu commented
It seems to be a problem with DeepSpeed. When I use the zero3 mode, model.generate
does not work properly. However, when I use multi_gpu mode, it works well.