bug: [DESCRIPTION]models start the model imported can not work.
Opened this issue · 6 comments
Cortex version
cortex-1.0.0-rc1-windows-amd64-local-installer
Describe the Bug
Run all models by imported,return “ Model failed to load with status code: 500”
Steps to Reproduce
1.cortex-beta models import --model_id gemma-2b-Q8_0.gguf --model_path ./gemma-2b-Q8_0.gguf
It is successful, and can run modles subcommand ,such as list,get ,update, delete.
2.cortex-beta models start gemma-2b-Q8_0.gguf
It return :
gguf_init_from_file: failed to open '': 'Invalid argument'
{"timestamp":1728130117,"level":"ERROR","function":"LoadModel","line":186,"message":"llama.cpp unable to load model","model":""}
Model failed to load with status code: 500
Error: ?
Screenshots / Logs
What is your OS?
- MacOS
- Windows
- Linux
What engine are you running?
- cortex.llamacpp (default)
- cortex.tensorrt-llm (Nvidia GPUs)
- cortex.onnx (NPUs, DirectML)
Hi @cloudherder, for models import
, the absolute path is required for --model_path
for now.
We will improve this soon. Apologies for the inconvenience.
Hi @cloudherder, for
models import
, the absolute path is required for--model_path
for now. We will improve this soon. Apologies for the inconvenience.
Thank you for your reply! You have created a great work! I tested it with an absolute path. The results are shown as follows:
The following error is recorded in the Cortex.log file:
20241005 13:29:56.458000 UTC 10188 ERROR ggml_backend_cuda_buffer_type_alloc_buffer: allocating 2539.93 MiB on device 0: cudaMalloc failed: out of memory
- llama_engine.cc:393
20241005 13:29:56.484000 UTC 10188 ERROR llama_model_load: error loading model: unable to allocate backend buffer - llama_engine.cc:393
20241005 13:29:56.484000 UTC 10188 ERROR llama_load_model_from_file: failed to load model
The sizes of the three models tested are 2.46GB,2.48GB and 7.06GB,my laptop has 16GB of memory, and using server.exe of llama.cpp can load and use these three models normally.
@cloudherder Seems like you don't have enough VRAM. Please try to set the ngl
of your model to 0 or 1
For example, with model gemma-2b-Q8_0.gguf
you can check model config by running:
cortex-beta models get gemma-2b-Q8_0.gguf
Then set the ngl
to 1:
cortex-beta models update --model_id gemma-2b-Q8_0.gguf --ngl 1
Run cortex-beta models get gemma-2b-Q8_0.gguf
to check if config is updated
Then try to start the model.
Can you share the output of nvidia-smi
command also?
@cloudherder Seems like you don't have enough VRAM. Please try to set the
ngl
of your model to 0 or 1 For example, with modelgemma-2b-Q8_0.gguf
you can check model config by running:cortex-beta models get gemma-2b-Q8_0.gguf
Then set the
ngl
to 1:cortex-beta models update --model_id gemma-2b-Q8_0.gguf --ngl 1
Run
cortex-beta models get gemma-2b-Q8_0.gguf
to check if config is updated Then try to start the model.Can you share the output of
nvidia-smi
command also?
Hi @cloudherder, apologies for late response. Can you please set ngl = 0
and try again?
Would you mind sharing the logs when you run with ngl = 1
?
Hi @cloudherder, we've released cortex v1.0.1 (release note)
We'll love if you can give cortex another go, with the many models you've downloaded.
To update to cortex v1.0.1 (or download here: https://cortex.so/)
> cortex update
> cortex update --server