Rust + llama_cpp_2 + Serenity (WIP)
-
Install Rust
Download and install Rust from https://rustup.rs. -
Clone and Build
git clone https://github.com/Leon1777/llama-discord-bot-rs.git cd llama-discord-bot-rs cargo build --release -
Configure
Add your Discord Token to a.envfile:DISCORD_TOKEN=your_token
Using PowerShell:
$env:DISCORD_TOKEN = "your_token"
-
Run the Bot
cargo run --release
or to enable CUDA
cargo run --release --features cuda
/ask <question>: Ask the bot anything./reset: Reset the chat history to the default system prompt./mission <new_system_prompt>: Reset the chat history and set a new system prompt.
- Stores the last 5 messages to maintain context (resets on restart)
-
Download the Model
from huggingface_hub import snapshot_download model_id = "repo/model" snapshot_download(repo_id=model_id, local_dir="model_name", local_dir_use_symlinks=False, revision="main")
-
Set Up Llama.cpp
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp pip install -r requirements.txt -
Convert to GGUF
python convert_hf_to_gguf model_folder --outfile model_name.gguf --outtype f16
-
Set Up Llama.cpp for Quantization
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp mkdir build cd build cmake .. -G "Ninja" -DCMAKE_BUILD_TYPE=Release cmake --build . cd bin
-
Quantize the Model
./llama-quantize 3.1-8B.fp16.gguf 3.1-8B.q6_K.gguf Q6_K
from huggingface_hub import snapshot_download
model_id = "mistralai/Mistral-Large-Instruct-2411"
snapshot_download(
repo_id=model_id,
local_dir="models/Mistral-Large-Instruct-2411",
local_dir_use_symlinks=False,
revision="main",
)This bot is lightweight, fast, and highly configurable, thanks to Rust, llama_cpp_2, and Serenity.