This project focuses on the evaluation, optimization, and enhancement of a Retrieval-Augmented Generation (RAG) pipeline for a chatbot designed to assist in therapy sessions. The chatbot leverages a combination of retrieval and generation models to provide contextually relevant and accurate responses based on pre-embedded transcripts from therapy sessions. Key functionalities include creating customized questionnaires, adding real-time notes, and fine-tuning the model for improved performance.
YouTube link: Therapist Helper ChatBot
Report link: Therapist Helper Chatbot.pdf
- Introduction
- Setup
- Usage
- Creating Questionnaires
- Adding Notes
- Fine-Tuning the Model
- Performance Metrics
- Improvements Implemented
- Challenges
- Error Handling and Troubleshooting
- Contributions
- Contact Info
- License
The Therapist Helper chatbot is designed to assist therapists by providing contextually relevant information, generating customized questionnaires, and allowing therapists to add real-time notes during sessions. The bot uses OpenAI's language models for both retrieval and generation tasks, ensuring a comprehensive understanding of the user's queries and generating accurate responses. The model is fine-tuned for better alignment with therapy-specific needs.
- Python 3.7 or higher
- Required Python libraries:
OpenAI
,scikit-learn
,numpy
,pandas
,transformers
,tiktoken
,Streamlit
,FastAPI
,python-dotenv
-
Clone the repository:
git clone https://github.com/Mansi-Dabriwal/TherapistHelperBot cd TherapistHelperBot
-
Install the required libraries:
pip install -r requirements.txt
-
Set up OpenAI API key:
- Create a
.env
file in the root directory with the following content:OPENAI_API_KEY=your-openai-api-key
- Create a
- Run the Application:
python fastApi.py
- Run the Streamlit Application:
streamlit run app.py
- Run the Evaluation:
This script calculates the performance metrics and outputs the results.
python evaluation.py
The chatbot can generate a customized questionnaire based on the patient's current feeling and desired feeling. To create a questionnaire:
- Navigate to the 'Create Questionnaire' section in the Streamlit app.
- Choose the patient's current feeling and desired feeling.
- The bot will generate the questions, and you can download the questionnaire as a file.
Therapists can add real-time notes during sessions using the 'Add Notes' functionality. To add notes:
- Use the text box provided in the Streamlit app to input your notes.
- After completing the session, submit your notes.
- The notes will be curated and made available for download.
The model can be fine-tuned to better align with specific therapy-related queries and context.
Steps for Fine-Tuning:
- Prepare the Dataset:
- Collect and preprocess therapy-specific text data that you want the model to learn from.
- Fine-Tune Using the Transformers Library:
from transformers import GPT2LMHeadModel, GPT2Tokenizer, Trainer, TrainingArguments
model = GPT2LMHeadModel.from_pretrained('gpt2')
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
# Tokenize your dataset
# Define the training arguments and fine-tune the model
- Evaluate the Fine-Tuned Model:
- Run the evaluation script to calculate the new performance metrics post fine-tuning.
- The project includes scripts for calculating various performance metrics, such as Context Precision, Context Recall, Faithfulness, and more.
The following metrics are calculated to evaluate the chatbot's performance:
-
Retrieval Metrics:
- Context Precision
- Context Recall
- Context Relevance
- Noise Robustness
-
Generation Metrics:
- Faithfulness
- Answer Relevance
- Information Integration
- Counterfactual Robustness
- Negative Rejection
- Latency
- Enhanced context retrieval to improve recall and relevance.
- Introduced cross-verification of generated answers to increase faithfulness.
- Developed filters and validation checks for better handling of counterfactual and negative queries.
- Added functionality to create customizable questionnaires.
- Integrated note-taking and download features for real-time session documentation.
- Fine-tuned the model for improved alignment with therapy-specific contexts.
- Handling noisy or irrelevant inputs without compromising the quality of retrieved contexts.
- Managing token limits for both retrieval and generation stages.
- Ensuring the faithfulness of generated answers to the source material.
- Balancing flexibility and specificity in the creation of therapy questionnaires.
- RateLimitError: You have exceeded your current quota. Please check your plan and billing details.
- InvalidRequestError: There was an issue with your request, likely due to invalid parameters or incorrect API key.
- Network Issues: Ensure you have a stable internet connection.
- Retry Logic: The application has built-in retry logic for handling temporary rate limits. If the error persists, consider checking your OpenAI plan and usage.
- API Key: Ensure your API key is correct and has the necessary permissions.
Contributions are welcome! Please fork the repository and submit a pull request with your changes. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository.
- Create a new branch (
git checkout -b feature/YourFeature
). - Make your changes.
- Commit your changes (
git commit -am 'Add some feature'
). - Push to the branch (
git push origin feature/YourFeature
). - Create a new Pull Request.
- Name: Mansi Dabriwal
- LinkedIn: https://www.linkedin.com/in/mansi-da
- Email: mansi07dabriwal@gmail.com, dabriwal.m@northeastern.edu
This project is licensed under the MIT License. See the LICENSE file for details.