jianzhnie/LLamaTuner

Tokenizer class BaiChuanTokenizer does not exist or is not currently imported.

Opened this issue · 6 comments

corlin commented

错误信息如下

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/corlin/code/Efficient-Tuning-LLMs/qlora_finetune.py:396 in │
│ │
│ 393 │
│ 394 │
│ 395 if name == 'main': │
│ ❱ 396 │ main() │
│ 397 │
│ │
│ /Users/corlin/code/Efficient-Tuning-LLMs/qlora_finetune.py:312 in main │
│ │
│ 309 │ set_seed(args.seed) │
│ 310 │ │
│ 311 │ # Tokenizer │
│ ❱ 312 │ tokenizer = AutoTokenizer.from_pretrained( │
│ 313 │ │ args.model_name_or_path, │
│ 314 │ │ cache_dir=args.cache_dir, │
│ 315 │ │ padding_side='right', │
│ │
│ /Users/corlin/code/transformers/src/transformers/models/auto/tokenization_auto.py:688 in │
│ from_pretrained │
│ │
│ 685 │ │ │ │ tokenizer_class_candidate = config_tokenizer_class │
│ 686 │ │ │ │ tokenizer_class = tokenizer_class_from_name(tokenizer_class_candidate) │
│ 687 │ │ │ if tokenizer_class is None: │
│ ❱ 688 │ │ │ │ raise ValueError( │
│ 689 │ │ │ │ │ f"Tokenizer class {tokenizer_class_candidate} does not exist or is n │
│ 690 │ │ │ │ ) │
│ 691 │ │ │ return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *input │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Tokenizer class BaiChuanTokenizer does not exist or is not currently imported.

corlin commented

macos M1环境

You should download the BaiChuanTokenizer and BaiChuan Model Checkpont from the https://huggingface.co/baichuan-inc/baichuan-7B first

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/baichuan-7B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/baichuan-7B", device_map="auto", trust_remote_code=True)
inputs = tokenizer('登鹳雀楼->王之涣\n夜雨寄北->', return_tensors='pt')
inputs = inputs.to('cuda:0')
pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

corlin commented

You should download the BaiChuanTokenizer and BaiChuan Model Checkpont from the https://huggingface.co/baichuan-inc/baichuan-7B first
image
相关模型目录文件是全的啊。

Run folowing example to test the model and tokenizer is well loaded and well inference

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("your_download_model_path", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("your_download_model_path", device_map="auto", trust_remote_code=True)
inputs = tokenizer('登鹳雀楼->王之涣\n夜雨寄北->', return_tensors='pt')
inputs = inputs.to('cuda:0')
pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
RIU-13 commented

我没加trust_remote_code会报错,加了就好了