Conversational AI sandbox
Vicuna-13B is a fine-tuned version of LLaMA [5].
- LLaMA [1]: 65B LLM by Meta
- Alpaca [2]: fine-tuned version of LLaMA
- GPTQ repo [3]
- GPTQ-for-LLaMA LLaMA-specific implementation of GOTQ [3]
- LLaMA 7B
- LLaMA 13B 4 bit
- Alpaca 7B, natively fine-tuned, i.e. no LoRA [4].
- Alpaca 7B 4bit 4 bit quantized weights.
- Alpaca 30B 4bit, 4 bit, trained using LoRA [4].
Assumes requirements.txt has been installed.
- Clone GPTQ-for-LLaMA:
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git
intothird_party
folder. - In GPTQ folder:
CUDA_PATH=/usr/local/cuda-11.7 python setup_cuda.py install
(assuming project environment is active) - Test installation:
CUDA_VISIBLE_DEVIES=0 python test_kernel.py
-
Hugo Touvron, et al., LLaMA: Open and Efficient Foundation Language Models, https://arxiv.org/abs/2302.13971
-
Rohan Taori, et al., Stanford Alpaca: An Instruction-following LLaMA model, https://github.com/tatsu-lab/stanford_alpaca
-
Elias Frantar, et al., GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers, https://arxiv.org/abs/2210.17323
-
Edward J. Hu, et al., LoRA: Low-Rank Adaptation of Large Language Models, https://arxiv.org/abs/2106.09685