Alpaca-LoRA as a service

Demonstrate Alpaca-LoRA as a Chatbot service with Alpaca-LoRA and Gradio. Main features include:

it enables batch inference by aggregating requests until the previous requests are finished (fixed at 4)
it achieves context aware by keeping chatting history with the following string:

f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Input: {input} # Surrounding information to AI
### Instruction: {prompt1} # First instruction/prompt given by user
### Response {response1} # First response on the first prompt by AI
### Instruction: {prompt2} # Second instruction/prompt given by user
### Response: {response2} # Second response on the first prompt by AI
....
"""

it provides an additional script to run various configurations to see how it affects the generation quality and speed
it currently supports the following Alpaca-LoRA:
- tloen/alpaca-lora-7b: the original 7B Alpaca-LoRA checkpoint by tloen
- chansung/alpaca-lora-13b: the 13B Alpaca-LoRA checkpoint by myself(chansung) with the same script to tune the original 7B model
- chansung/koalpaca-lora-13b: the 13B Alpaca-LoRA checkpoint by myself(chansung) with the Korean dataset created by KoAlpaca project by Beomi. It works for English(user) to Korean(AI) conversations.
- chansung/alpaca-lora-30b: the 30B Alpaca-LoRA checkpoint by myself(chansung) with the same script to tune the original 7B model

Instructions

Prerequisites

Note that the code only works Python >= 3.9

$ conda create -n alpaca-serve python=3.9
$ conda activate alpaca-serve

Install dependencies

$ pip install -r requirements.txt

Run Gradio application

$ BASE_URL=decapoda-research/llama-7b-hf
$ FINETUNED_CKPT_URL=tloen/alpaca-lora-7b
$
$ python app.py --base_url $BASE_URL --ft_ckpt_url $FINETUNED_CKPT_URL --port 6006

aliasfoxkde/Alpaca-LoRA-Serve

Alpaca-LoRA as a service

Instructions

Screenshots