Demonstrate Alpaca-LoRA as a Chatbot service with Alpaca-LoRA and Gradio. Main features include:
-
it enables batch inference by aggregating requests until the previous requests are finished (fixed at 4)
-
it achieves context aware by keeping chatting history with the following string:
f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Input: {input} # Surrounding information to AI
### Instruction: {prompt1} # First instruction/prompt given by user
### Response {response1} # First response on the first prompt by AI
### Instruction: {prompt2} # Second instruction/prompt given by user
### Response: {response2} # Second response on the first prompt by AI
....
"""
-
it provides an additional script to run various configurations to see how it affects the generation quality and speed
-
it currently supports the following Alpaca-LoRA:
- tloen/alpaca-lora-7b: the original 7B Alpaca-LoRA checkpoint by tloen
- chansung/alpaca-lora-13b: the 13B Alpaca-LoRA checkpoint by myself(chansung) with the same script to tune the original 7B model
- chansung/koalpaca-lora-13b: the 13B Alpaca-LoRA checkpoint by myself(chansung) with the Korean dataset created by KoAlpaca project by Beomi. It works for English(user) to Korean(AI) conversations.
- chansung/alpaca-lora-30b: the 30B Alpaca-LoRA checkpoint by myself(chansung) with the same script to tune the original 7B model
- Prerequisites
Note that the code only works Python >= 3.9
$ conda create -n alpaca-serve python=3.9
$ conda activate alpaca-serve
- Install dependencies
$ pip install -r requirements.txt
- Run Gradio application
$ BASE_URL=decapoda-research/llama-7b-hf
$ FINETUNED_CKPT_URL=tloen/alpaca-lora-7b
$
$ python app.py --base_url $BASE_URL --ft_ckpt_url $FINETUNED_CKPT_URL --port 6006