This project implements a simple chatbot using microsoft/phi-2
model.
The goal is to create a chatbot that can classify the user input into three classes (support
, comercial
and joke
) and respond accordingly using the right prompts. It uses a different prompt template depending on the classification output.
Here you can see a GIF showcasing a short conversation.
If you want to quickly run the code, you have three options:
- The easiest way is to run
Not-so-simple phi-2 chatbot
notebook on colab - Download the code and run
not_so_simple_phi2_chatbot.ipynb
notebook (you need to installrequirements.txt
) - Download the code and run
gradio app.py
(you need to installrequirements.txt
)
⚠️ Warning: Inference is much fastar on GPU than on CPU.
In addition to the functionality above, we also implemented a simple LoRA fine tuning notebook (finetune.ipynb
). It fine tunes the model with a toy conversation dataset.
You can also run the gradio
app using Docker.
Build the image doing
sudo docker build -t gradio-app-image .
And then run it with
sudo docker run -p 7860:7860 -t gradio-app-image
Broadly speaking, the solution to this problem was to:
- Take the input query from the user (
human_input
) and classify it into one of the three categories - Define a different instruction set (
instructions
) depending on the classification result - Retrieve the history of past chats (
chat_history
) - Put everything together in a template like the one below.
You are a chatbot having a conversation with a human. \
Follow the given instructions to reply to the Human message below.
Instructions:{instructions}
{chat_history}
Human: {human_input}
Chatbot:
In this way, the chatbot could not only follow the conversation between it and the human, but also get specific instructions on how to behave depending on the topic of the conversation.
This project is a continuation of Simple phi-2 chatbot so check there for the basic info on how it was done.
This chatbot demo was done in the following steps:
As we need to classify our input sentence into one of three categories
support
sales
joke
the first thing to do is to find a suitable text classifier. We opted for a 0-shot learning classifier because we have no samples from any of the categories. After some exploration and after trying several 0-shot classifiers from the Model Hub we reached the conclusion that phi-2
was also the best model for classification. You can see the different experiments and results in experiments.ipynb
.
After having ready our classifier, the next step was to get a sense of how we could use langchain
to complete the project. In experiments_langchain.ipynb
you can see the differet steps we took in order to get from the user message, to a complete chat-like prompt with the following parts like the one shown above.
Finally, to make use of the langchain
chain in streaming mode (so that we get the effect of writing the text word by word, instead of a big chunk at a time), we had to break the chain into two parts. The first half is in charge of the classification of the text, and composing the full prompt we'll use. The second part is the call to the language model in streaming mode.
Finally, the code was adapted to run on different configurations: using gradio command, on a local jupyter notebook or on Google Colab. Check the Quickstart section.
- Appart from language models,
langchain
offers ChatModels as well. The difference between the two comes to the level of abstraction. While standard language models work in a text-in/text-out basis; ChatModels work with the concept of "chat messages". However, they were not considered to be necessary for this project because the task at hand is simple enough and does't necessarily need for an extra level of abstraction. - As explained in Adapt chain to work in streaming mode, we split the chain in two parts to implement the streaming mode. We weren't capable of allowing streaming mode using a full chain instead. We followed instructions from this forum but with no luck. We believe there has to be a way to implement streaming within a chain. However, it may not be implemented yet, as we can see here (mind that the
HuggingFacePipeline
object suggested in the forum appears not to have streaming capabilities). Anyway, the solution splitting the chain was quick, and working; and so it wasn't justified to keep searching for a better solution. - There is a chat interface build by the HuggingFace team called ChatUI. It's visually very appealing, but it had two limitations regarding this project.
- It was more complex to use and to setup than
gradio.ChatInterface()
. - It did not use Gradio, it's a SvelteKitt app (mind that we're instructed to use
gradio
).
- It was more complex to use and to setup than
- We make the classification of the text based solely on the last message of the user. This can lead to some missclassification issues. For example, it may think we're joking when we provide some numerical data. This makes sense because the classifier does not have access to the full context. Although this could be a nice-to-have improvement, our focus now is on other requirements.
- Ensure security and appropriate responses. There is no security mechanism in place to protect against attacks like prompt injection. We do not ensure responses generated by the model are harmless and appropriate either. This could be interesting for a production project not only to ensure the chatbot responds politely (it does not give rude or discriminatory responses, for example) and/or adapts the tone when talking to users (for example with a very angry customer).