Offloads some of the model layers to the GPU, allowing larger models to be loaded
eniompw/llama-cpp-gpu
Load larger models by offloading model layers to both GPU and CPU
Jupyter NotebookMIT
Load larger models by offloading model layers to both GPU and CPU
Jupyter NotebookMIT