Using MLX on macOS to run Llama 2. Highly experimental.
Install this plugin in the same environment as LLM.
llm install https://github.com/simonw/llm-mlx-llama/archive/refs/heads/main.zip
Download mlx-llama/Llama-2-7b-chat-mlx to Llama-2-7b-chat-mlx
.
huggingface-cli download --local-dir Llama-2-7b-chat-mlx mlx-llama/Llama-2-7b-chat-mlx
Pass paths to target directory and tokenizer.model
as options when you run a prompt:
llm -m mlx-llama \
'five great reasons to get a pet pelican:' \
-o model Llama-2-7b-chat-mlx \
-o tokenizer Llama-2-7b-chat-mlx/tokenizer.model
Chat mode and continuing a conversation are not yet supported.
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
cd llm-mlx-llama
python3 -m venv venv
source venv/bin/activate
Now install the dependencies and test dependencies:
llm install -e '.[test]'
To run the tests:
pytest