Errors running on mac

Question

Errors running on mac

dima-m711 opened this issue 5 months ago · 4 comments

On Mac M1 Max OSX 14.2.1 with python 3.11.

when running with parameters:
ailice_main --modelID=hf:Open-Orca/Mistral-7B-OpenOrca --prompt="main" --quantization=8bit --contextWindowRatio=0.6
getting error because of quantization parameter
Error:
Encountered an exception, AIlice is exiting: No GPU found. A GPU is needed for quantization.

if not torch.cuda.is_available():
    raise RuntimeError("No GPU found. A GPU is needed for quantization.")

which expected because M1 do not have cuda

without quantization parameter getting other error:
Encountered an exception, AIlice is exiting: The current `device_map` had weights offloaded to the disk. Please provide an `offload_folder` for them. Alternatively, make sure you have `safetensors` installed if the model you are using offers the weights in this format.

@stevenlu137 what are you thought of using external local running LLM with LM Studio or ollama and implement AModelLocal.py ?
it can run all the same hugging face models

Answer 1 · 2024-01-26T04:27:39.000Z

Since several issues have mentioned this requirement recently, I'm considering adding support for inference services like Ollama.

Answer 2 · 2024-01-26T05:41:46.000Z

@dima-m711 try changing all mention of "cuda" in the code to "CPU". It'll run slow, but you should be able to use the CPU to generate. Hopefully it works on Mac as it does in both Windows and Linux.

Another way is to make the generation not rely on hardcoded GPU, instead opting got a argument in the config file. Example: yuna-ai allows users to switch between CPU (no GPU), mps (Mac), and cuda (Nvidia).

Answer 3 · 2024-01-26T13:18:39.000Z

Done. please checkout the latest version of dev branch.

Answer 4 · 2024-01-27T21:41:14.000Z

Awesome, Thanks .
I was able to run it the with LM Studio using the instructions from 'Models(or Services) compatible' section