Your AI-powered chatbot for PDFs and web pages β built with Streamlit, LangChain, and Gemini 2.5 Pro.
Live Link π https://chat-with-pdf-or-webpage.onrender.com/
| Feature | Description |
|---|---|
| ποΈ PDF Chat | Drag-and-drop any PDF and start chatting instantly. |
| π Website Chat | Paste any URL to browse & chat with its content. |
| π§ Gemini 2.5 Pro | Powered by Googleβs newest reasoning model. |
| π Vector Store | Fast & persistent embeddings with Chroma. |
| πΌοΈ Streamlit UI | Responsive sidebar and chat interface. |
| π State-ful | Retains chat history and vectorstore across sessions. |
git clone https://github.com/<your-org>/chat-pdf-web.git
cd chat-pdf-webpip install -r requirements.txt
# or
pip install .Python β₯ 3.10 recommended.
streamlit run main.pyA browser tab will open at http://localhost:8501.
In the sidebar:
- Enter your Gemini API key.
- Upload a PDF or enter a website URL.
- Click Ingest and start chatting!
| Step | UI/CLI |
|---|---|
| 1. Provide key | Sidebar β βEnter your Gemini API keyβ¦β |
| 2. Upload PDF | βUpload PDFβ file picker. |
| 3. Add website | βEnter website URLβ input. |
| 4. Ingest | Click Ingest (downloads, chunk, embed). |
| 5. Chat | Type questions in the chat box. |
| 6. Clear Everything | Click βEnd Chatβ to reset. |
| Layer | Stack |
|---|---|
| UI | Streamlit, streamlit-extras |
| LLM | Google gemini-2.5-pro, google-generativeai |
| Embeddings | gemini-embedding-001 |
| Orchestration | LangChain |
| Vector DB | ChromaDB (persisted) |
| PDF parsing | PyPDFLoader via pypdf |
| Web scraping | WebBaseLoader, BeautifulSoup4 |
| Config | pyproject.toml (modern PEP621) |
graph LR
A[User inputs PDF/URL] --> B{Ingestion}
B -->|PDF| C[PyPDFLoader]
B -->|URL| D[WebBaseLoader]
C & D --> E[LangChain chunks]
E --> F[Embed with Gemini]
F --> G[Chroma Vector Store]
H[User question] --> I[RAGChatBot.retrieve]
I --> G
G --> J[LLM.answer]
J --> K[Show response in UI]
| Var | Default | Purpose |
|---|---|---|
GOOGLE_API_KEY |
β | Falls back to sidebar input. |
CHROMA_PERSIST_DIR |
./chroma_db |
Vectorstore path. |
CHUNK_SIZE |
1000 | Text-split parameter. |
CHUNK_OVERLAP |
200 | Text-split parameter. |
We love community contributions!
Please see CONTRIBUTING.md for guidelines.
- Fork.
pre-commit install.- Commit & push on a feature branch.
- Create a PR π
- Streamlit β docs.streamlit.io
- LangChain β langchain.dev
- Google AI β ai.google.dev
MIT Β© 2024 Build Fast with AI and contributors.