"ValueError: Failed to load model from file" for new Phi-3 models
Closed this issue · 2 comments
I can use Guidance with Phi-3-mini which was announced a while ago, but with the new ones (ϕ-3-medium class) I get:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File <placeholder>/test_guidance.py:33
30 return "<|end|>\n"
32 # tlm = LlamaCppChat(
---> 33 tlm = Phi3Chat(
34 # model="<placeholder>/LLM/models/Phi-3-mini-4k-instruct-q4.gguf",
35 model="<placeholder>/LLM/models/phi-3-medium-4k-instruct.Q4_0.gguf",
36 n_gpu_layers=128,
37 seed=42,
38 n_ctx=4096,
39 use_mlock=True,
40 no_mmap=True,
41 echo=True,
42 )
44 class Llama3Chat(LlamaCpp, Chat):
45 def get_role_start(self, role_name, **kwargs): # type: ignore
File <placeholder>/.venv/lib/python3.12/site-packages/guidance/models/llama_cpp/_llama_cpp.py:229, in LlamaCpp.__init__(self, model, echo, compute_log_probs, api_key, chat_template, **llama_cpp_kwargs)
227 engine = RemoteEngine(model, api_key=api_key, **llama_cpp_kwargs)
228 else:
--> 229 engine = LlamaCppEngine(
230 model, compute_log_probs=compute_log_probs, chat_template=chat_template, **llama_cpp_kwargs
231 )
233 super().__init__(engine, echo=echo)
File <placeholder>/.venv/lib/python3.12/site-packages/guidance/models/llama_cpp/_llama_cpp.py:122, in LlamaCppEngine.__init__(self, model, compute_log_probs, chat_template, **kwargs)
117 kwargs["verbose"] = (
118 True # llama-cpp-python can't hide output in this case
119 )
121 with normalize_notebook_stdout_stderr():
--> 122 self.model_obj = llama_cpp.Llama(model_path=model, logits_all=True, **kwargs)
123 elif isinstance(model, llama_cpp.Llama):
124 self.model = model.__class__.__name__
File <placeholder>/.venv/lib/python3.12/site-packages/llama_cpp/llama.py:338, in Llama.__init__(self, model_path, n_gpu_layers, split_mode, main_gpu, tensor_split, vocab_only, use_mmap, use_mlock, kv_overrides, seed, n_ctx, n_batch, n_threads, n_threads_batch, rope_scaling_type, pooling_type, rope_freq_base, rope_freq_scale, yarn_ext_factor, yarn_attn_factor, yarn_beta_fast, yarn_beta_slow, yarn_orig_ctx, logits_all, embedding, offload_kqv, flash_attn, last_n_tokens_size, lora_base, lora_scale, lora_path, numa, chat_format, chat_handler, draft_model, tokenizer, type_k, type_v, verbose, **kwargs)
335 if not os.path.exists(model_path):
336 raise ValueError(f"Model path does not exist: {model_path}")
--> 338 self._model = _LlamaModel(
339 path_model=self.model_path, params=self.model_params, verbose=self.verbose
340 )
342 # Override tokenizer
343 self.tokenizer_ = tokenizer or LlamaTokenizer(self)
File <placeholder>/.venv/lib/python3.12/site-packages/llama_cpp/_internals.py:57, in _LlamaModel.__init__(self, path_model, params, verbose)
52 self.model = llama_cpp.llama_load_model_from_file(
53 self.path_model.encode("utf-8"), self.params
54 )
56 if self.model is None:
---> 57 raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: <placeholder>/LLM/models/phi-3-medium-4k-instruct.Q4_0.gguf
Hey @ibehnam -- do you mind pointing me to where you got the 4bit GGUF from? A helpful test would be to see if llama-cpp-python
can load the file, with something like the following code:
from llama_cpp import Llama
llm = Llama(
model_path="<placeholder>/LLM/models/phi-3-medium-4k-instruct.Q4_0.gguf",
logits_all=True,
n_gpu_layers=128,
n_ctx=4096,
)
And perhaps a quick test of a generation:
output = llm(
"Q: Name the planets in the solar system? A: ", # Prompt
max_tokens=32, # Generate up to 32 tokens, set to None to generate up to the end of the context window
stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
echo=True # Echo the prompt back in the output
) # Generate a completion, can also call create_completion
print(output)
A look at your stack trace suggests that the issue may be coming from the upstream repo we depend on to interface with llama cpp (https://github.com/abetlen/llama-cpp-python), but I'm happy to try to debug on our side too.
@Harsha-Nori Thanks so much for your response. I did what you suggested and got the same error using llama-cpp-python
. I'll dig more and try to find a workaround. I know llama.cpp can handle the new models (ollama runs phi-3-medium just fine), so it'll probably boil down to manually compiling llama.cpp for the llama-cpp-python package.