LinkSoul-AI/Chinese-Llama-2-7b

快速测试代码段OOM

LiuZhihhxx opened this issue · 2 comments

运行readme.md的快速测试代码

from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer

model_path = "LinkSoul/Chinese-Llama-2-7b"

tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(model_path).half().cuda()
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

instruction = """[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

            If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n\n{} [/INST]"""

prompt = instruction.format("用中文回答,When is the best time to visit Beijing, and do you have any suggestions for me?")
generate_ids = model.generate(tokenizer(prompt, return_tensors='pt').input_ids.cuda(), max_new_tokens=4096, streamer=streamer)

model = AutoModelForCausalLM.from_pretrained(model_path).half().cuda()处,内存暴涨至100%后终止,报如下信息:

You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at huggingface/transformers#24565
[2023-07-28 10:55:34,419] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
进程已结束,退出代码137 (interrupted by signal 9: SIGKILL)

请问是什么原因?
(使用 https://github.com/lvwerra/trl 进行微调完全可以顺利运行,显存基本可以满载,说明cuda应该是没问题的)

541wsy commented

我也是同样的问题

解决了吗,我加载成功了,但一直卡在推理