Wheels for llama-cpp-python compiled with cuBLAS support.
Requirements:
- Windows x64, Linux x64, or MacOS 11.0+
- CUDA 11.6 - 12.2
- CPython 3.8 - 3.11
llama.cpp, and llama-cpp-python by extension, has migrated to using the new GGUF format and has dropped support for GGML. This applies to version 0.1.79+.
ROCm builds for AMD GPUs: https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels/releases/tag/rocm
Metal builds for MacOS 11.0+: https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels/releases/tag/metal
To install, you can use this command:
python -m pip install llama-cpp-python --prefer-binary --extra-index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX2/cu117
This will install the latest llama-cpp-python version available from here for CUDA 11.7. You can change cu117
to change the CUDA version.
You can also change AVX2
to AVX
, AVX512
or basic
based on what your CPU supports.
basic
is a build without AVX
, FMA
and F16C
instructions for old or basic CPUs.
CPU-only builds are also available by changing cu117
to cpu
.
You can install a specific version with:
python -m pip install llama-cpp-python==<version> --prefer-binary --extra-index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX2/cu117
An example for installing 0.1.62 for CUDA 12.1 on a CPU without AVX2 support:
python -m pip install llama-cpp-python==0.1.62 --prefer-binary --extra-index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX/cu121
List of available versions:
python -m pip index versions llama-cpp-python --index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX2/cu117
If you are replacing an already existing installation, you may need to uninstall that version before running the command above.
You can also replace the existing version in one command like so:
python -m pip install llama-cpp-python --force-reinstall --no-deps --index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX2/cu117
-OR-
python -m pip install llama-cpp-python==0.1.66 --force-reinstall --no-deps --index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX2/cu117
-OR-
python -m pip install llama-cpp-python --prefer-binary --upgrade --extra-index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX2/cu117
Wheels can be manually downloaded from: https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels
I have renamed llama-cpp-python packages available to ease the transition to GGUF.
This is accomplished by installing the renamed package alongside the main llama-cpp-python package.
This should allow applications to maintain GGML support while still supporting GGUF.
python -m pip install llama-cpp-python-ggml --prefer-binary --extra-index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX2/cu117