alpaca_lora_4bit: A Python repository from wesleysanjose

By leveraging johnsmith0031/alpaca_lora_4bit, it is quite impressive that the 4-bit Alpaca model can fit into an 8GB GPU card and perform inference quickly. However, finetuning on an 8GB GPU using LoRa is quite slow, with one epoch taking approximately 60 hours on an Nvidia 2060s GPU when using the Stanford Alpaca dataset, which contains around 50,000 records.

Also I have created a Colab notebook https://colab.research.google.com/gist/wesleysanjose/7397fa641be35686e4bcc2c29dc99dd3/lapaca_lora_gptq_4bit.ipynb so that you can finetune this Alpaca LoRa 4-bit model in colab. It downloads the Llama 7bit base model and sample datasets from HF, and performs finetuning.

wesleysanjose/alpaca_lora_4bit