how to tokennizer vicuna
beyondli opened this issue · 2 comments
Hi ,
I want to test the token speed of minigpt4, but tokenizer failed
AutoTokenizer.from_pretrained('maknee/ggml-vicuna-v0-quantized/13B') or
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'maknee/ggml-vicuna-v0-quantized/13B'. Use repo_type
argument if needed.
AutoTokenizer.from_pretrained('maknee/ggml-vicuna-v0-quantized') both failed.
Repository Not Found for url: https://huggingface.co/maknee/ggml-vicuna-v0-quantized/resolve/main/tokenizer_config.json.
Please make sure you specified the correct repo_id
and repo_type
.
what is correct command for tokennizer? thanks
The tokenizer used is the llama.cpp
tokenizer. Call add_strings
in c++ and set a timer to eval the speed. Unfortunately, you can't use AutoTokenizer
from hugging face to directly load the tokenizer :(
The tokenizer internally calls llama_tokenize_internal
in llama.cpp
Hi Maknee,
Thanks!