/llama-discord-bot-rs

a discord bot written in rust, using serenity and the llama_cpp_2 rust bindings to interact with an open-source LLM

Primary LanguageRustMIT LicenseMIT

Rust + llama_cpp_2 + Serenity (WIP)

Run the Bot

  1. Install Rust
    Download and install Rust from https://rustup.rs.

  2. Clone and Build

    git clone https://github.com/Leon1777/llama-discord-bot-rs.git
    cd llama-discord-bot-rs
    cargo build --release
  3. Configure
    Add your Discord Token to a .env file:

    DISCORD_TOKEN=your_token

    Using PowerShell:

    $env:DISCORD_TOKEN = "your_token"
  4. Run the Bot

    cargo run --release

    or to enable CUDA

    cargo run --release --features cuda

Commands

  • /ask <question>: Ask the bot anything.
  • /reset: Reset the chat history to the default system prompt.
  • /mission <new_system_prompt>: Reset the chat history and set a new system prompt.

Features

  • Stores the last 5 messages to maintain context (resets on restart)

Models Tested


Convert HuggingFace Model to GGUF Format

  1. Download the Model

    from huggingface_hub import snapshot_download
    
    model_id = "repo/model"
    snapshot_download(repo_id=model_id, local_dir="model_name",
                      local_dir_use_symlinks=False, revision="main")
  2. Set Up Llama.cpp

    git clone https://github.com/ggerganov/llama.cpp.git
    cd llama.cpp
    pip install -r requirements.txt
  3. Convert to GGUF

    python convert_hf_to_gguf model_folder --outfile model_name.gguf --outtype f16

Quantize FP16

  1. Set Up Llama.cpp for Quantization

    git clone https://github.com/ggerganov/llama.cpp.git
    cd llama.cpp
    mkdir build
    cd build
    cmake .. -G "Ninja" -DCMAKE_BUILD_TYPE=Release
    cmake --build .
    cd bin
  2. Quantize the Model

    ./llama-quantize 3.1-8B.fp16.gguf 3.1-8B.q6_K.gguf Q6_K

Example: Download a Model from HuggingFace

from huggingface_hub import snapshot_download

model_id = "mistralai/Mistral-Large-Instruct-2411"
snapshot_download(
    repo_id=model_id,
    local_dir="models/Mistral-Large-Instruct-2411",
    local_dir_use_symlinks=False,
    revision="main",
)

This bot is lightweight, fast, and highly configurable, thanks to Rust, llama_cpp_2, and Serenity.