Retrieval and Generation of Clinical Insights from PDF

Overview
Features
System Requirements
Setup
Usage
Acknowledgments
Contributions

Overview

The RAFT project integrates Retrieval-Augmented Generation (RAG) with fine-tuning approaches to provide a versatile question-answering system. Leveraging the strengths of OpenAI GPT-3.5 Turbo and Google MedPaLM 2, this framework aims to enhance information retrieval and generation capabilities across various domains.

Features

1. Unified Vector Store

A single vector store is used for document retrieval, ensuring efficiency and speed regardless of the selected generation model.

2. Dynamic LLM Integration

Seamlessly switch between OpenAI GPT-3.5 Turbo and MedPaLM 2 depending on the specific needs of the query and the context provided by the user.

3. Interactive Streamlit UI

A user-friendly interface allows for real-time uploading of PDF documents and querying, making the system accessible to users with minimal technical background.

System Requirements

Python 3.11.5 or higher
Dependencies as listed in the requirements.txt file

Setup

To get started with the RAFT project, follow these steps:

Clone the repository:

git clone <repository_url>

Navigate to the project directory:

cd RAFT

Create a Virtual Environment:

python -m venv venv

Activate the Virtual Environment:

source venv/bin/activate  # Mac/Linux
venv\Scripts\activate     # Windows

Install Dependencies:

pip install -r requirements.txt

Set Environment Variables: Create a .env file in the project root with the following variables (Please refer the example.env file):

OPENAI_API_KEY=your_openai_api_key_here
FINE_TUNED_MODEL_NAME=ft:gpt-3.5-turbo-0125:your_fine_tuned_model_name
GOOGLE_API_KEY=your_google_api_key_here

Usage

To run the Streamlit application, use the following command:

streamlit run app.py

This will launch the Streamlit server, allowing you to upload PDF files and ask questions in real-time.

Uploading PDF Files

Use the file uploader in the Streamlit app to upload one or more PDF files.
The system will process the PDFs and create a vector store for retrieval.

Asking Questions

After uploading PDF files, enter your question in the text area and click the "Ask" button.
The system will return an answer based on the selected language model.

Acknowledgments

LangChain: A framework for building applications with large language models and data.
Chroma: An open-source embedding database for retrieval.
OpenAI: Provides the GPT-3.5 Turbo language model.
Google Palm: A generative language model from Google.

Contributing

Contributions to this project are welcome. Please fork the repository and submit a pull request for any changes or improvements.

ashishpatel26/Retrieval-and-Generation-of-Clinical-Insights-from-PDF