/LLM-As-Chatbot

LLM as a Chatbot Service

Primary LanguagePythonApache License 2.0Apache-2.0

💬🚀 LLM as a Chatbot Service

The purpose of this repository is to let people to use lots of open sourced instruction-following fine-tuned LLM models as a Chatbot service. Because different models behave differently, and different models require differently formmated prompts, I made a very simple library Ping Pong for model agnostic conversation and context managements. Also, I made GradioChat UI looking similar to HuggingChat but entirely built in Gradio. Those two projects are fully integrated to power this project.

Context management

Different model might have different strategies to manage context, so if you want to know the exact strategies applied to each model, take a look at the chats directory. However, here are the basic ideas that I have come up with initially. I have found long prompts will slow down the generation process a lot eventually, so I thought the prompts should be kept as short as possible while as concise as possible at the same time. In the previous version, I have accumulated all the past conversations, and that didn't go well.

  • In every turn of the conversation, the past N conversations will be kept. Think about the N as a hyper-parameter. As an experiment, currently the past 2-3 conversations are only kept for all models.
  • (TBD) In every turn of the conversation, it summarizes or extract information. The summarized information will be given in the every next turn of conversation.

Currently supported models

Checkout the list of models

Instructions

  1. Prerequisites

Note that the code only works Python >= 3.9 and gradio >= 3.32.0

$ conda create -n llm-serve python=3.9
$ conda activate llm-serve
  1. Install dependencies. flash-attn and triton are included to support MPT models, If you don't want to use MPT, comment them out, otherwise you will face two module not found errors, then you will have to install packaging and torch packages while facing the errors.
$ cd LLM-As-Chatbot
$ pip install -r requirements.txt
  1. Run Gradio application
$ python app.py

How to plugin your own model

You need to follow the following steps to bring your own models in this project.

  1. Add your model spec in model_cards.json. If you don't have thumnail image, just leave it as blank string("").
  2. Add the button for your model in app.py. Don't forget to give it a name in the gr.Button and gr.Markdown. For placeholders, their names are omitted. Assign the gr.Button to a variable with the name of your choice.
  3. Add the button variable to the button list in the app.py
  4. Determine the model type in global_vars.py. If you think your model is similar to one of the existings, just add a filtering rules(if-else) and give it the same name.
  5. (Optional) if your model is totally new one, you need to give a new model_type in global_vars.py, and make changes accordingly in utils.py, and chats/central.py.

Todos

  • Gradio components to control the configurations of the generation
  • Flan based Alpaca models
  • Multiple conversation management
  • Implement server only option w/ FastAPI
  • ChatGPT's plugin like features

Acknowledgements

  • I am thankful to Jarvislabs.ai who generously provided free GPU resources to experiment with Alpaca-LoRA deployment and share it to communities to try out.
  • I am thankful to Common Computer who generously provided A100(40G) x 8 DGX workstation for fine-tuning the models.