This project implements a Retrieval-Augmented Generation (RAG) system to create a chatbot that can answer questions about book reviews and opinions. The chatbot uses reports from Goodreads and Reddit to provide comprehensive information about popular psychology books.
README.md
poetry.lock
pyproject.toml
reports/
├── goodreads_report.txt
└── reddit_report.txt
src/
└── rag_book_reviews/
├── __init__.py
├── vector_db.py
├── chat_interface.py
├── chainlit_app.py
├── populate_db.py
└── read_reports.py
vector_db.py
: Handles the creation and management of the vector database using DeepLake.chat_interface.py
: Implements the chatbot interface using LangChain and OpenAI's GPT model.read_reports.py
: Reads and processes the Goodreads and Reddit reports.populate_db.py
: Populates the vector database with the processed reports.chainlit_app.py
: The entry point of the application, integrating Chainlit for the chat interface.
-
Clone the repository:
git clone https://github.com/yourusername/rag-book-reviews-chatbot.git cd rag-book-reviews-chatbot
-
Install dependencies using Poetry:
poetry install
-
Set up environment variables: Create a
.env
file in the root directory with the following content:OPENAI_API_KEY=your_openai_api_key ACTIVELOOP_TOKEN=your_activeloop_token ACTIVELOOP_ID=your_activeloop_id
-
Populate the vector database:
python src/rag_book_reviews/populate_db.py
-
Run the chatbot using Chainlit:
chainlit run src/rag_book_reviews/chainlit_app.py
This will start the Chainlit server and open the chat interface in your default web browser.
- The
populate_db.py
script reads the Goodreads and Reddit reports, processes them, and stores them in a DeepLake vector database. - The
chat_interface.py
uses LangChain and OpenAI's GPT model to create a retrieval-based question-answering system. - When a user asks a question, the system uses a RetrievalQAWithSourcesChain to retrieve relevant information from the vector database.
- The retrieved information is used to generate an informed response, and the sources of the information are also returned.
- The Chainlit interface in
chainlit_app.py
provides a user-friendly chat experience, allowing users to interact with the chatbot seamlessly and view the sources of the information provided.
- Retrieval-Augmented Generation for accurate and context-aware responses
- Use of RetrievalQAWithSourcesChain for advanced retrieval and answer generation
- Return of source information for each response, enhancing transparency and credibility
- Integration with DeepLake for efficient vector storage and retrieval
- User-friendly chat interface powered by Chainlit
- Comprehensive book reviews and opinions from Goodreads and Reddit