A simple telegram bot for a chat instance with your local Llama model.
Utilising LangChain, llama-cpp-python and GGUF models for CPU and GPU support.
-
Generate a bot on Telegram using @BotFather
- set a name for your bot and obtain the
token to access the HTTP API
- place the API token in
botdata/credentials.py
- set a name for your bot and obtain the
-
Get a GGUF model of your choice, for example from TheBloke on Huggingface
- tested with llama2_7b_chat.Q4_K_M.gguf
- you should be able to use any GGUF model (not only Llama2) but might have to adjust the initial prompt according the model used
- place the path to your model file in
botdata/settings.py
- create and activate
virtual environment
# Windows python -m venv env .\env\Scripts\activate # Linux sudo apt install python3.10-venv python -m venv env source env/bin/activate
-
the following steps only cover
CUDA
, if you are not using an NVIDIA GPU, follow the instructions on llama-cpp-python Github -
make sure
CUDA
is installed on your system or install it according to Official Docs -
install
llama-cpp-python
with GPU support# Windows $env:FORCE_CMAKE=1 $env:CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir # Linux CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
- install
requirements
pip install -r requirements.txt
After placing API Token
and model_path
and installing dependencies with
either GPU or CPU support, simply run:
python app.py
and start a chat with your bot on Telegram.
- ToDos
- multi-memory / persistent memory
- logging
- error handling
- skip
/start
message - exit / abort function