本地推理模型报错[BUG] <title>

Question

本地推理模型报错[BUG] <title>

Opened this issue 2 months ago · 0 comments

xiyangyang99 commented 2 months ago

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

报错的日志如下，将模型下载到本地推理，使用的single GPU rtx3060
/home/bowen/anaconda3/envs/qwen/bin/python /media/bowen/6202c499-4f0a-4280-af7e-d2ab4b6c74dd/home/bowen/Qwen-VL/test.py
Try importing flash-attention for faster inference...
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 100%|██████████| 8/8 [00:01<00:00, 5.55it/s]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
Traceback (most recent call last):
File "/media/bowen/6202c499-4f0a-4280-af7e-d2ab4b6c74dd/home/bowen/Qwen-VL/test.py", line 22, in
query = tokenizer.from_list_format([
AttributeError: 'QWenTokenizer' object has no attribute 'from_list_format'

Process finished with exit code 1

使用的推理脚本：
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig
import torch
torch.manual_seed(1234)

Note: The default behavior now has injection attack prevention off.

tokenizer = AutoTokenizer.from_pretrained("./Qwen", trust_remote_code=True)

use bf16

model = AutoModelForCausalLM.from_pretrained("./Qwen", device_map="auto", trust_remote_code=True, bf16=True).eval()

use fp16

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-VL-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval()

use cpu only

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-VL-Chat", device_map="cpu", trust_remote_code=True).eval()

use cuda device

#model = AutoModelForCausalLM.from_pretrained("./Qwen", device_map="cuda", trust_remote_code=True).eval()

Specify hyperparameters for generation

model.generation_config = GenerationConfig.from_pretrained("./Qwen", trust_remote_code=True)

1st dialogue turn

query = tokenizer.from_list_format([
{'image': 'https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg'}, # Either a local path or an url
{'text': '这是什么?'},
])
response, history = model.chat(tokenizer, query=query, history=None)
print(response)

图中是一名女子在沙滩上和狗玩耍，旁边是一只拉布拉多犬，它们处于沙滩上。

2nd dialogue turn

response, history = model.chat(tokenizer, '框出图中击掌的位置', history=history)
print(response)

击掌(536,509),(588,602)

image = tokenizer.draw_bbox_on_latest_picture(response, history)
if image:
image.save('1.jpg')
else:
print("no box")

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response