This project is a smart chatbot that can read your PDF, DOCX, or TXT documents and answer any questions about their content.
It uses a Retrieval-Augmented Generation (RAG) approach with Google Gemini AI, meaning it first finds the most relevant text from your document and then asks Gemini to answer your question in a natural way.
Even a child can think of it like this:
- π« Imagine you give your teacher a book (your document)
- π You ask the teacher a question about the book
- π§ The teacher looks at the book, finds the right part, and then explains the answer to you clearly
Thatβs exactly what this chatbot does!
β
Upload PDF, DOCX, or TXT files
β
Ask questions in normal language
β
Uses smart AI (Gemini) to find answers
β
Keeps a chat history so you can see all past questions and answers
β
Clear chat & document anytime
β
Download chat history as TXT
1οΈβ£ Clone this project
git clone <your-repo-url>
cd RAG_Powered_Document_Reader2οΈβ£ Create a virtual environment & activate it
python -m venv venv
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activate3οΈβ£ Install required dependencies
pip install streamlit PyPDF2 python-docx sentence-transformers google-generativeai1οΈβ£ Set your Gemini API key
In your terminal:
set GOOGLE_API_KEY=your_api_key_here # Windows
export GOOGLE_API_KEY=your_api_key_here # macOS/Linux2οΈβ£ Run the Streamlit app
streamlit run rag_chatbot.py3οΈβ£ Open your browser
Go to http://localhost:8501
- Extract text β When you upload a document, the app extracts all the text.
- Find best matches β When you ask a question, it finds the most relevant parts.
- Ask Gemini β It sends the relevant text + your question to Google Gemini AI.
- Show answer β Gemini gives a nice, simple answer.
π RAG_Powered_Document_Reader
βββ rag_chatbot.py # Main Streamlit app
βββ file_loader.py # Extracts text from PDF/DOCX/TXT
βββ qa_engine.py # Handles embeddings + Gemini QA
βββ requirements.txt # Dependencies list
βββ .gitignore # Ignore unnecessary files
βββ README.md # This file
1οΈβ£ Upload any PDF/DOCX/TXT document
2οΈβ£ Type your question in the text box
3οΈβ£ The chatbot will read your document & answer
4οΈβ£ You can clear chat + document anytime
5οΈβ£ You can download chat history as TXT
- You upload a Company Policy PDF
- You ask:
βWhat is the companyβs leave policy?β
- The chatbot finds the right section and answers:
βThe company allows 20 paid leaves per year and requires 2 days prior notice.β
- Streamlit β Web app interface
- PyPDF2 / python-docx β Extract text from documents
- SentenceTransformers β Find most relevant text chunks
- Google Gemini AI β Generate natural answers
This project is free to use for learning purposes.
Now you can ask your documents anything, just like chatting with a super-smart friend! π