Tokenizer class BaiChuanTokenizer does not exist or is not currently imported.

Question

Tokenizer class BaiChuanTokenizer does not exist or is not currently imported.

Opened this issue a year ago · 6 comments

错误信息如下

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/corlin/code/Efficient-Tuning-LLMs/qlora_finetune.py:396 in │
│ │
│ 393 │
│ 394 │
│ 395 if name == 'main': │
│ ❱ 396 │ main() │
│ 397 │
│ │
│ /Users/corlin/code/Efficient-Tuning-LLMs/qlora_finetune.py:312 in main │
│ │
│ 309 │ set_seed(args.seed) │
│ 310 │ │
│ 311 │ # Tokenizer │
│ ❱ 312 │ tokenizer = AutoTokenizer.from_pretrained( │
│ 313 │ │ args.model_name_or_path, │
│ 314 │ │ cache_dir=args.cache_dir, │
│ 315 │ │ padding_side='right', │
│ │
│ /Users/corlin/code/transformers/src/transformers/models/auto/tokenization_auto.py:688 in │
│ from_pretrained │
│ │
│ 685 │ │ │ │ tokenizer_class_candidate = config_tokenizer_class │
│ 686 │ │ │ │ tokenizer_class = tokenizer_class_from_name(tokenizer_class_candidate) │
│ 687 │ │ │ if tokenizer_class is None: │
│ ❱ 688 │ │ │ │ raise ValueError( │
│ 689 │ │ │ │ │ f"Tokenizer class {tokenizer_class_candidate} does not exist or is n │
│ 690 │ │ │ │ ) │
│ 691 │ │ │ return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *input │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Tokenizer class BaiChuanTokenizer does not exist or is not currently imported.

Answer 1 · 2023-06-17T05:19:43.000Z

macos M1环境

Answer 2 · 2023-06-17T06:12:31.000Z

You should download the BaiChuanTokenizer and BaiChuan Model Checkpont from the https://huggingface.co/baichuan-inc/baichuan-7B first

Answer 3 · 2023-06-17T06:13:01.000Z

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/baichuan-7B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/baichuan-7B", device_map="auto", trust_remote_code=True)
inputs = tokenizer('登鹳雀楼->王之涣\n夜雨寄北->', return_tensors='pt')
inputs = inputs.to('cuda:0')
pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

Answer 4 · 2023-06-19T02:32:06.000Z

You should download the BaiChuanTokenizer and BaiChuan Model Checkpont from the https://huggingface.co/baichuan-inc/baichuan-7B first

相关模型目录文件是全的啊。

Answer 5 · 2023-06-20T02:57:57.000Z

Run folowing example to test the model and tokenizer is well loaded and well inference

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("your_download_model_path", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("your_download_model_path", device_map="auto", trust_remote_code=True)
inputs = tokenizer('登鹳雀楼->王之涣\n夜雨寄北->', return_tensors='pt')
inputs = inputs.to('cuda:0')
pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

Answer 6 · 2023-08-20T09:52:45.000Z

我没加trust_remote_code会报错，加了就好了