Serge is a chat interface crafted with llama.cpp for running Alpaca models. No API keys, entirely self-hosted!
- 🌐 SvelteKit frontend
- 💾 Redis for storing chat history & parameters
- ⚙️ FastAPI + LangChain for the API, wrapping calls to llama.cpp using the python bindings
🎥 Demo:
demo.webm
🐳 Docker:
docker run -d \
--name serge \
-v weights:/usr/src/app/weights \
-v datadb:/data/db/ \
-p 8008:8008 \
ghcr.io/serge-chat/serge:latest
🐙 Docker Compose:
services:
serge:
image: ghcr.io/serge-chat/serge:latest
container_name: serge
restart: unless-stopped
ports:
- 8008:8008
volumes:
- weights:/usr/src/app/weights
- datadb:/data/db/
volumes:
weights:
datadb:
Then, just visit http://localhost:8008/, You can find the API documentation at http://localhost:8008/api/docs
Ensure you have Docker Desktop installed, WSL2 configured, and enough free RAM to run models.
Instructions for setting up Serge on Kubernetes can be found in the wiki.
We currently support the following models:
- Airoboros 🎈
- Airoboros-7B
- Airoboros-13B
- Airoboros-30B
- Airoboros-65B
- Alpaca 🦙
- Alpaca-LoRA-65B
- GPT4-Alpaca-LoRA-30B
- BigTrans 🗺
- BigTrans-13B
- Chronos 🌑
- Chronos-13B
- Chronos-33B
- Chronos-Hermes-13B
- GPT4All 🌍
- GPT4All-13B
- Guanaco 🦙
- Guanaco-7B
- Guanaco-13B
- Guanaco-33B
- Guanaco-65B
- Koala 🐨
- Koala-7B
- Koala-13B
- Llama 🦙
- FinLlama-33B
- Llama-Supercot-30B
- Lazarus 💀
- Lazarus-30B
- Minotour 🐃
- Minotaur-15B
- Nous 🧠
- Nous-Hermes-13B
- OpenAssistant 🎙️
- OpenAssistant-30B
- Robin 🏹
- Robin-7B
- Robin-13B
- Robin-33B
- Robin-65B
- Samantha 👩
- Samantha-7B
- Samantha-13B
- Samantha-33B
- Tulu 🎚
- Tulu-7B
- Tulu-13B
- Tulu-30B
- Vicuna 🦙
- Stable-Vicuna-13B
- Vicuna-CoT-7B
- Vicuna-CoT-13B
- Vicuna-v1.1-7B
- Vicuna-v1.1-13B
- VicUnlocked-30B
- VicUnlocked-65B
- Vicuna-v1.3-7B
- Vicuna-v1.3-13B
- Wizard 🧙
- Wizard-Mega-13B
- Wizard-Vicuna-Uncensored-7B
- Wizard-Vicuna-Uncensored-13B
- Wizard-Vicuna-Uncensored-30B
- WizardLM-30B
- WizardLM-Uncensored-7B
- WizardLM-Uncensored-13B
- WizardLM-Uncensored-30B
Additional weights can be added to the serge_weights
volume using docker cp
:
docker cp ./my_weight.bin serge:/usr/src/app/weights/
LLaMA will crash if you don't have enough available memory for the model:
Model | Max RAM Required |
---|---|
7B | 4.5GB |
7B-q2_K | 5.37GB |
7B-q3_K_L | 6.10GB |
7B-q4_1 | 6.71GB |
7B-q4_K_M | 6.58GB |
7B-q5_1 | 7.56GB |
7B-q5_K_M | 7.28GB |
7B-q6_K | 8.03GB |
7B-q8_0 | 9.66GB |
13B | 12GB |
13B-q2_K | 8.01GB |
13B-q3_K_L | 9.43GB |
13B-q4_1 | 10.64GB |
13B-q4_K_M | 10.37GB |
13B-q5_1 | 12.26GB |
13B-q5_K_M | 11.73GB |
13B-q6_K | 13.18GB |
13B-q8_0 | 16.33GB |
33B | 20GB |
33B-q2_K | 16.21GB |
33B-q3_K_L | 19.78GB |
33B-q4_1 | 22.83GB |
33B-q4_K_M | 22.12GB |
33B-q5_1 | 26.90GB |
33B-q5_K_M | 25.55GB |
33B-q6_K | 29.19GB |
33B-q8_0 | 37.06GB |
65B | 50GB |
65B-q2_K | 29.95GB |
65B-q3_K_L | 37.15GB |
65B-q4_1 | 43.31GB |
65B-q4_K_M | 41.85GB |
65B-q5_1 | 51.47GB |
65B-q5_K_M | 48.74GB |
65B-q6_K | 56.06GB |
65B-q8_0 | 71.87GB |
Need help? Join our Discord
If you discover a bug or have a feature idea, feel free to open an issue or PR.
To run Serge in development mode:
git clone https://github.com/serge-chat/serge.git
DOCKER_BUILDKIT=1 docker compose -f docker-compose.dev.yml up -d --build