llama-discord-bot-rs: A Rust repository from newbee1905

Rust + llama_cpp_2 + Serenity (WIP)

Run the Bot

Install Rust
Download and install Rust from https://rustup.rs.

Clone and Build

git clone https://github.com/Leon1777/llama-discord-bot-rs.git
cd llama-discord-bot-rs
cargo build --release

Configure
Add your Discord Token to a .env file:
```
DISCORD_TOKEN=your_token
```
Using PowerShell:
```
$env:DISCORD_TOKEN = "your_token"
```

Run the Bot

cargo run --release

or to enable CUDA

cargo run --release --features cuda

Commands

/ask <question>: Ask the bot anything.
/reset: Reset the chat history to the default system prompt.
/mission <new_system_prompt>: Reset the chat history and set a new system prompt.

Features

Stores the last 5 messages to maintain context (resets on restart)

Models Tested

Convert HuggingFace Model to GGUF Format

Download the Model

from huggingface_hub import snapshot_download

model_id = "repo/model"
snapshot_download(repo_id=model_id, local_dir="model_name",
                  local_dir_use_symlinks=False, revision="main")

Set Up Llama.cpp

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
pip install -r requirements.txt

Convert to GGUF

python convert_hf_to_gguf model_folder --outfile model_name.gguf --outtype f16

Quantize FP16

Set Up Llama.cpp for Quantization

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
mkdir build
cd build
cmake .. -G "Ninja" -DCMAKE_BUILD_TYPE=Release
cmake --build .
cd bin

Quantize the Model

./llama-quantize 3.1-8B.fp16.gguf 3.1-8B.q6_K.gguf Q6_K

Example: Download a Model from HuggingFace

from huggingface_hub import snapshot_download

model_id = "mistralai/Mistral-Large-Instruct-2411"
snapshot_download(
    repo_id=model_id,
    local_dir="models/Mistral-Large-Instruct-2411",
    local_dir_use_symlinks=False,
    revision="main",
)

This bot is lightweight, fast, and highly configurable, thanks to Rust, llama_cpp_2, and Serenity.

newbee1905/llama-discord-bot-rs