Can't run deepseek-vl with script on M2
WayneCui opened this issue · 6 comments
import mlx.core as mx
from mlx_vlm import load, generate
model_path = "mlx-community/deepseek-vl-7b-chat-4bit"
model, processor = load(model_path)
prompt = processor.tokenizer.apply_chat_template(
[{"role": "user", "content": f"<image>\nWhat are these?"}],
tokenize=False,
add_generation_prompt=True,
)
output = generate(model, processor, "http://images.cocodataset.org/val2017/000000039769.jpg", prompt, verbose=False)
print(output)
Traceback (most recent call last):
File "/Users/wayne/2-learning/Projects/gpt/DeepSeek-VL/inference2.py", line 8, in
prompt = processor.tokenizer.apply_chat_template(
AttributeError: 'LlamaTokenizerFast' object has no attribute 'tokenizer'
>>> processor
LlamaTokenizerFast(name_or_path='/Users/wayne/.cache/huggingface/hub/models--mlx-community--deepseek-vl-7b-chat-4bit/snapshots/79feff56645faf5f145c834118ca3d43c8c55984', vocab_size=100000, model_max_length=16384, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<|begin▁of▁sentence|>', 'eos_token': '<|end▁of▁sentence|>', 'additional_special_tokens': ['<image>']}, clean_up_tokenization_spaces=False), added_tokens_decoder={
100000: AddedToken("<|begin▁of▁sentence|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
100001: AddedToken("<|end▁of▁sentence|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
100002: AddedToken("ø", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
100003: AddedToken("ö", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
100004: AddedToken("ú", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
100005: AddedToken("ÿ", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
100006: AddedToken("õ", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
100007: AddedToken("÷", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
100008: AddedToken("û", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
100009: AddedToken("ý", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
100010: AddedToken("À", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
100011: AddedToken("ù", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
100012: AddedToken("Á", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
100013: AddedToken("þ", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
100014: AddedToken("ü", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
100015: AddedToken("<image_placeholder>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
100016: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
Processor doesn't have a tokenizer attribute and it doesn't use newline in the prompt.
Try this:
import mlx.core as mx
from mlx_vlm import load, generate
model_path = "mlx-community/deepseek-vl-7b-chat-4bit"
model, processor = load(model_path)
prompt = processor.apply_chat_template(
[{"role": "user", "content": f"<image>What are these?"}],
tokenize=False,
add_generation_prompt=True,
)
output = generate(model, processor, "http://images.cocodataset.org/val2017/000000039769.jpg", prompt, verbose=False)
print(output)
Let me know how it goes
Processor doesn't have a tokenizer attribute and it doesn't use newline in the prompt.
Try this:
import mlx.core as mx from mlx_vlm import load, generate model_path = "mlx-community/deepseek-vl-7b-chat-4bit" model, processor = load(model_path) prompt = processor.apply_chat_template( [{"role": "user", "content": f"<image>What are these?"}], tokenize=False, add_generation_prompt=True, ) output = generate(model, processor, "http://images.cocodataset.org/val2017/000000039769.jpg", prompt, verbose=False) print(output)
(deepseek) ➜ DeepSeek-VL git:(main) ✗ python inference2.py
Fetching 6 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 107088.61it/s]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
File "/Users/wayne/2-learning/Projects/gpt/DeepSeek-VL/inference2.py", line 13, in <module>
output = generate(model, processor, "http://images.cocodataset.org/val2017/000000039769.jpg", prompt, verbose=False)
File "/Users/wayne/anaconda3/envs/deepseek/lib/python3.9/site-packages/mlx_vlm/utils.py", line 830, in generate
prompt_tokens = mx.array(processor.tokenizer.encode(prompt))
AttributeError: 'LlamaTokenizerFast' object has no attribute 'tokenizer'
Thanks for your reply! Seems there is processor.tokenizer.encode
in mlx_vlm/utils.py
Hey @WayneCui
The image_preprocessor
object was missing, this should work fine:
import mlx.core as mx
from mlx_vlm.utils import load, generate, load_image_processor
model_path = "mlx-community/deepseek-vl-7b-chat-4bit"
model, processor = load(model_path)
image_processor = load_image_processor(model_path)
prompt = processor.apply_chat_template(
[{"role": "user", "content": f"<image>What are these?"}],
tokenize=False,
add_generation_prompt=True,
)
output = generate(
model,
processor,
"http://images.cocodataset.org/val2017/000000039769.jpg",
prompt,
image_processor,
verbose=False
)
print(output)
The docs are coming soon with examples for all models and how to guides.
It works for me, thanks a lot!
Most welcome;)