PDF-ChatBot is a powerful web application built using Streamlit that allows users to interact with multiple PDF files using natural language queries. Leveraging the capabilities of Google Generative AI and FAISS for vector-based search, this bot provides accurate responses from the content of uploaded PDFs.
The chatbot uses state-of-the-art machine learning models like Gemini and FAISS to process and retrieve relevant sections from large PDFs, making it an efficient tool for document analysis, research, and information extraction.
- Multiple PDF Support: Upload and process multiple PDFs simultaneously.
- Natural Language Interaction: Ask questions in natural language and receive detailed answers from the PDF content.
- Vector Search: FAISS-based search to find relevant answers efficiently.
- Google Generative AI Integration: Uses the Gemini model to generate responses based on the extracted content.
- Fast Processing: Capable of handling large documents and returning results quickly.
- Upload PDFs: Upload one or more PDF files through the interface.
- PDF Parsing: The PDFs are parsed, and text is extracted using
PyPDF2
. - Text Chunking: The extracted text is split into manageable chunks using Langchain's
RecursiveCharacterTextSplitter
. - Vectorization: Text chunks are converted into vector representations using Google's Generative AI Embeddings.
- Vector Store Creation: A FAISS index is created from these vectors for fast similarity search.
- Ask Questions: Users can input questions through the interface, and the chatbot will search the PDFs and respond with relevant information.
- Response Generation: Based on the search results, the chatbot generates detailed answers using the Gemini model.
- Streamlit: Frontend interface for the PDF-ChatBot.
- Langchain: Used for chaining models and creating vector-based document retrieval.
- Google Generative AI: Handles embeddings and conversational model responses.
- FAISS: Efficient similarity search across large text datasets.
- PyPDF2: For PDF parsing and text extraction.
-
Python 3.8+
-
Install dependencies:
pip install -r requirements.txt
-
Set up a
.env
file with your Google API key:GOOGLE_API_KEY=your_api_key_here
-
Clone the repository:
git clone https://github.com/NitinYadav1511/PDF-ChatBot.git
-
Navigate to the project directory:
cd PDF-ChatBot
-
Install the required Python libraries:
pip install -r requirements.txt
-
Run the Streamlit application:
streamlit run main.py
-
Upload PDFs through the sidebar and start asking questions!
- Upload PDFs: Click on "Upload your PDF Files" in the sidebar to upload multiple PDF documents.
- Ask Questions: Enter your question in the text input field, and the bot will search the PDFs and respond with relevant answers.
- Add support for summarizing entire PDFs.
- Improve the conversational flow and context retention across multiple questions.
- Implement a feedback mechanism to refine answer quality.
This repository is maintained by Nitin Yadav.
Contributions are welcome! Feel free to open an issue or submit a pull request.
This project is licensed under the MIT License.