/llamagram

A simple telegram bot for a chat instance with your local Llama model

Primary LanguagePython

Llamagram

A simple telegram bot for a chat instance with your local Llama model.

Utilising LangChain, llama-cpp-python and GGUF models for CPU and GPU support.

Prerequisites

  • Generate a bot on Telegram using @BotFather

    • set a name for your bot and obtain the token to access the HTTP API
    • place the API token in botdata/credentials.py
  • Get a GGUF model of your choice, for example from TheBloke on Huggingface

    • tested with llama2_7b_chat.Q4_K_M.gguf
    • you should be able to use any GGUF model (not only Llama2) but might have to adjust the initial prompt according the model used
    • place the path to your model file in botdata/settings.py

Setup

  • create and activate virtual environment
    # Windows
    python -m venv env
    .\env\Scripts\activate
    
    # Linux
    sudo apt install python3.10-venv
    python -m venv env
    source env/bin/activate
    

GPU (skip for CPU only)

  • the following steps only cover CUDA, if you are not using an NVIDIA GPU, follow the instructions on llama-cpp-python Github

  • make sure CUDA is installed on your system or install it according to Official Docs

  • install llama-cpp-python with GPU support

    # Windows
    $env:FORCE_CMAKE=1
    $env:CMAKE_ARGS="-DLLAMA_CUBLAS=on"
    pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
    
    # Linux
    CMAKE_ARGS="-DLLAMA_CUBLAS=on"
    FORCE_CMAKE=1
    pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
    

CPU & GPU

  • install requirements
    pip install -r requirements.txt
    

Usage

After placing API Token and model_path and installing dependencies with either GPU or CPU support, simply run:

python app.py

and start a chat with your bot on Telegram.

Roadmap

  • ToDos
    • multi-memory / persistent memory
    • logging
    • error handling
    • skip /start message
    • exit / abort function

Back to Top