Yet Another LLaMA Discord Bot

What is this?

A chatbot for Discord using Meta's LLaMA model, 4-bit quantized. The 13 billion parameters model fits within less than 9 GiB VRAM.

Installation

Before you do any of this, you will need a bot token. If you don't have a bot token, follow this guide to make a bot and then add the bot to your server.

Presently this is Linux only, but you might be able to make it work with other OSs.

Make sure you have Python 3.10+, virtualenv (pip install virtualenv), and CUDA installed.
Clone the bot and setup the virtual environment.

git clone https://github.com/AmericanPresidentJimmyCarter/yal-discord-bot/
cd yal-discord-bot
python3 -m virtualenv env
source env/bin/activate
pip install -r requirements.txt

Setup transformers fork and ignore any version incompatibility errors when you do this.

git clone https://github.com/huggingface/transformers/
cd transformers
git checkout 20e54e49fa11172a893d046f6e7364a434cbc04f
pip install -e .
cd ..

Build 4-bit CUDA kernel.

cd bot/llama_model
python setup_cuda.py install
cd ../..

Download the 4-bit quantized model to somewhere local. For bigger/smaller 4-bit quantized weights, refer to this link.

wget https://huggingface.co/Neko-Institute-of-Science/LLaMA-13B-4bit-128g/resolve/main/llama-13b-4bit-128g.safetensors

Fire up the bot.

cd bot
python -m bot $YOUR_BOT_TOKEN --allow-queue -g $YOUR_GUILD --llama-model="Neko-Institute-of-Science/LLaMA-13B-4bit-128g" --groupsize=128 --load-checkpoint="path/to/llama/weights/llama-13b-4bit-128g.safetensors"

Ensure that $YOUR_BOT_TOKEN and $YOUR_GUILD are set to what they should be, --load-checkpoint=..." is pointing at the correct location of the weights, and --llama-model=... is pointing at the correct location in Huggingface to find the configuration for the weights.

Using an ALPACA model (Recommended)

You can use any ALPACA model by setting the --alpaca flag, which will allow you to add input strings as well as automatically format your prompt into the form expected by ALPACA.

Recommended 4-bit ALPACA weights are as follows:

Or GPT4 finetuned (better coding responses, more restrictive in content):

30b (MetaIX/GPT4-X-Alpaca-30B-Int4)

cd bot
python -m bot $YOUR_BOT_TOKEN --allow-queue -g $YOUR_GUILD --alpaca --groupsize=128 --llama-model="elinas/alpaca-30b-lora-int4" --load-checkpoint="path/to/alpaca/weights/alpaca-30b-4bit-128g.safetensors"