- Clone this repository
git clone https://github.com/continuedev/ggml-server-example - Move into the folder:
cd ggml-server-example - Create a virtual environment:
python3 -m venv env - Activate the virtual environment:
source env/bin/activateon Mac,env\Scripts\activate.baton Windows,source env/bin/activate.fishif using fish terminal - Install required packages:
pip install -r requirements.txt
- Download a model to the
models/folder- Here is a convenient source of models that can be downloaded: https://huggingface.co/TheBloke
- For example, download 4-bit quantized WizardLM-7B from here (we recommend this model): https://huggingface.co/TheBloke/wizardLM-7B-GGML/blob/main/wizardLM-7B.ggmlv3.q4_0.bin
- Run the server with
python3 -m llama_cpp.server --model models/wizardLM-7B.ggmlv3.q4_0.bin
- To set this as your default model in Continue, you can open
~/.continue/config.jsoneither manually or using the/configslash command in Continue. Then, set"default_model": "ggml", reload your VS Code window, and you're good to go!
Happy to help. Email use at hi@continue.dev.