Not-so-simple phi-2 chatbot

This project implements a simple chatbot using microsoft/phi-2 model. The goal is to create a chatbot that can classify the user input into three classes (support, comercial and joke) and respond accordingly using the right prompts. It uses a different prompt template depending on the classification output.

Here you can see a GIF showcasing a short conversation.

Quickstart

If you want to quickly run the code, you have three options:

The easiest way is to run Not-so-simple phi-2 chatbot notebook on colab
Download the code and run not_so_simple_phi2_chatbot.ipynb notebook (you need to install requirements.txt)
Download the code and run gradio app.py (you need to install requirements.txt)

⚠️ Warning: Inference is much fastar on GPU than on CPU.

Fine tune

In addition to the functionality above, we also implemented a simple LoRA fine tuning notebook (finetune.ipynb). It fine tunes the model with a toy conversation dataset.

Docker

You can also run the gradio app using Docker.

Build the image doing

sudo docker build -t gradio-app-image .

And then run it with

sudo docker run -p 7860:7860 -t gradio-app-image

Approach

Broadly speaking, the solution to this problem was to:

Take the input query from the user (human_input) and classify it into one of the three categories
Define a different instruction set (instructions) depending on the classification result
Retrieve the history of past chats (chat_history)
Put everything together in a template like the one below.

You are a chatbot having a conversation with a human. \
Follow the given instructions to reply to the Human message below.

Instructions:{instructions}

{chat_history}
Human: {human_input}
Chatbot:

In this way, the chatbot could not only follow the conversation between it and the human, but also get specific instructions on how to behave depending on the topic of the conversation.

How it was done

This project is a continuation of Simple phi-2 chatbot so check there for the basic info on how it was done.

This chatbot demo was done in the following steps:

Find the right text classifier

As we need to classify our input sentence into one of three categories

support
sales
joke

the first thing to do is to find a suitable text classifier. We opted for a 0-shot learning classifier because we have no samples from any of the categories. After some exploration and after trying several 0-shot classifiers from the Model Hub we reached the conclusion that phi-2 was also the best model for classification. You can see the different experiments and results in experiments.ipynb.

Experiment with `langchain`

After having ready our classifier, the next step was to get a sense of how we could use langchain to complete the project. In experiments_langchain.ipynb you can see the differet steps we took in order to get from the user message, to a complete chat-like prompt with the following parts like the one shown above.

Adapt chain to work in streaming mode

Finally, to make use of the langchain chain in streaming mode (so that we get the effect of writing the text word by word, instead of a big chunk at a time), we had to break the chain into two parts. The first half is in charge of the classification of the text, and composing the full prompt we'll use. The second part is the call to the language model in streaming mode.

Adapt the code for different run configurations

Finally, the code was adapted to run on different configurations: using gradio command, on a local jupyter notebook or on Google Colab. Check the Quickstart section.

Considerations and limitations

Appart from language models, langchain offers ChatModels as well. The difference between the two comes to the level of abstraction. While standard language models work in a text-in/text-out basis; ChatModels work with the concept of "chat messages". However, they were not considered to be necessary for this project because the task at hand is simple enough and does't necessarily need for an extra level of abstraction.
As explained in Adapt chain to work in streaming mode, we split the chain in two parts to implement the streaming mode. We weren't capable of allowing streaming mode using a full chain instead. We followed instructions from this forum but with no luck. We believe there has to be a way to implement streaming within a chain. However, it may not be implemented yet, as we can see here (mind that the HuggingFacePipeline object suggested in the forum appears not to have streaming capabilities). Anyway, the solution splitting the chain was quick, and working; and so it wasn't justified to keep searching for a better solution.
There is a chat interface build by the HuggingFace team called ChatUI. It's visually very appealing, but it had two limitations regarding this project.
- It was more complex to use and to setup than gradio.ChatInterface().
- It did not use Gradio, it's a SvelteKitt app (mind that we're instructed to use gradio).
We make the classification of the text based solely on the last message of the user. This can lead to some missclassification issues. For example, it may think we're joking when we provide some numerical data. This makes sense because the classifier does not have access to the full context. Although this could be a nice-to-have improvement, our focus now is on other requirements.
Ensure security and appropriate responses. There is no security mechanism in place to protect against attacks like prompt injection. We do not ensure responses generated by the model are harmless and appropriate either. This could be interesting for a production project not only to ensure the chatbot responds politely (it does not give rude or discriminatory responses, for example) and/or adapts the tone when talking to users (for example with a very angry customer).

gnuevo/not-so-simple-phi-2-chatbot