This repo showcases a quick and easy way to install Vicuna by lm-sys, based on LLaMA by Meta, one of the most powerful LLMs on the market right now. From my short time playing around with it, I find Vicuna to be comparable to ChatGPT in many respects, and is one of the better LLaMA flavours right now (in comparison to Stanford's Alcapa or Berkeley's Koala).
Go here and follow the instructions for your OS. Alternatively, use miniconda or mamba/micromamba if you need for speed. Mamba is generally faster than conda in solving environments.
conda create -n vicuna-matata python=3.10.9
conda activate vicuna-matata
You can just use this little command to install everything and skip to Step 4:
conda create -n vicuna-matata pytorch torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit -c 'nvidia/label/cuda-11.7.0' -c pytorch -c nvidia
conda activate vicuna-matata
Keep in mind this alternative command will download roughly 2GB of data, and conda can be quite hit and miss with their download speed. If it's so unbearably slow, you can try some suggestions in this thread. sandeepgadhwal's command of conda config --set add_anaconda_token False
helped me, your mileage might vary.
pip3 install torch torchvision torchaudio
conda install -c conda-forge cudatoolkit-dev
WARNING: Depending on your internet connection speed, the command conda install -c conda-forge cudatoolkit-dev
can take a very long time to complete, without any progress bar of update. Mine took anywhere from 10 to 30 minutes, so be patient, do something else and come back later.
Credits to oobabooga for the amazing work with this front end.
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt
The Vicuna team did not provide their own weights, only the instructions on how to convert the LLaMA weights (which requires a ton of computing resources), but luckily anon8231489123 came to the rescue and did all that hard work for us. Still, it's a file downloaded from the internet, so use at your own risk.
python download-model.py anon8231489123/vicuna-13b-GPTQ-4bit-128g
mkdir repositories
cd repositories
git clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cuda
cd GPTQ-for-LLaMa
pip install -r requirements.txt
python setup_cuda.py install
NOTE: python setup_cuda.py install
will fail if you're using a GCC version higher than 11.5, so Google how to install multiple versions of GCC and then set the appropriate ENV variable. i.e. Ubuntu . If you're on an Archlike bleeding edge rolling release distro (Arch, Artix, EndeavorOS, Manjaro, etc.), just install gcc9-bin from the AUR, then open up .bashrc or .zshrc or .profile and add export CC=gcc-9 CXX=g++-9
, then source .profile
(or .bashrc or .zshrc)
cd ..
cd ..
python server.py --model anon8231489123/vicuna-13b-GPTQ-4bit-128g --auto-devices --wbits 4 --groupsize 128
Now open up your favourite web browser, type in http://127.0.0.1:7860 (or simply http://localhost:7860). You might also need to turn off your adblocker and proxy for it to work. And that's it, you now have access to one of the most powerful LLMs on the market, right from the comfort and privacy of your own PC.
The default interface will be an Instruct interface (similar to gpt3-davinci and the likes), if you want a chat interface (like ChatGPT), use python server.py --model anon8231489123/vicuna-13b-GPTQ-4bit-128g --auto-devices --wbits 4 --groupsize 128 --chat
instead.
Also, the model tends to hallucinate and start saying things like "Human: abcxyz , Assitant: abcxyz", you can fix it by going to Parameters tab -> Custom stopping strings -> Type in:
"### Human:", "Human:", "user:"
Or anything else that is at the beginning of the line that AI started its hallucination such as: "Translator:" or "English teacher:"