/Chat_with_pdf_or_webpage

This application allows you to upload PDF , add wep page URL and allows you to chat with them

Primary LanguagePython

πŸ“š chat-pdf-web

Your AI-powered chatbot for PDFs and web pages – built with Streamlit, LangChain, and Gemini 2.5 Pro.

Python Streamlit LangChain Gemini License PRs Welcome


Live Link πŸ‘‰ https://chat-with-pdf-or-webpage.onrender.com/

✨ Features

Feature Description
πŸ—‚οΈ PDF Chat Drag-and-drop any PDF and start chatting instantly.
🌐 Website Chat Paste any URL to browse & chat with its content.
🧠 Gemini 2.5 Pro Powered by Google’s newest reasoning model.
πŸ” Vector Store Fast & persistent embeddings with Chroma.
πŸ–ΌοΈ Streamlit UI Responsive sidebar and chat interface.
πŸ” State-ful Retains chat history and vectorstore across sessions.

πŸš€ Quick Start

1. Clone

git clone https://github.com/<your-org>/chat-pdf-web.git
cd chat-pdf-web

2. Install

pip install -r requirements.txt
# or
pip install .

Python β‰₯ 3.10 recommended.

3. Launch

streamlit run main.py

A browser tab will open at http://localhost:8501.

4. Configure

In the sidebar:

  1. Enter your Gemini API key.
  2. Upload a PDF or enter a website URL.
  3. Click Ingest and start chatting!

πŸ“– Usage

Step UI/CLI
1. Provide key Sidebar β†’ β€œEnter your Gemini API key…”
2. Upload PDF β€œUpload PDF” file picker.
3. Add website β€œEnter website URL” input.
4. Ingest Click Ingest (downloads, chunk, embed).
5. Chat Type questions in the chat box.
6. Clear Everything Click β€œEnd Chat” to reset.

πŸ›  Tech Stack

Layer Stack
UI Streamlit, streamlit-extras
LLM Google gemini-2.5-pro, google-generativeai
Embeddings gemini-embedding-001
Orchestration LangChain
Vector DB ChromaDB (persisted)
PDF parsing PyPDFLoader via pypdf
Web scraping WebBaseLoader, BeautifulSoup4
Config pyproject.toml (modern PEP621)

πŸ“ Data Flow

graph LR
A[User inputs PDF/URL] --> B{Ingestion}
B -->|PDF| C[PyPDFLoader]
B -->|URL| D[WebBaseLoader]
C & D --> E[LangChain chunks]
E --> F[Embed with Gemini]
F --> G[Chroma Vector Store]
H[User question] --> I[RAGChatBot.retrieve]
I --> G
G --> J[LLM.answer]
J --> K[Show response in UI]
Loading

βš™οΈ Environment Variables (optional)

Var Default Purpose
GOOGLE_API_KEY β€” Falls back to sidebar input.
CHROMA_PERSIST_DIR ./chroma_db Vectorstore path.
CHUNK_SIZE 1000 Text-split parameter.
CHUNK_OVERLAP 200 Text-split parameter.

🀝 Contributing

We love community contributions!
Please see CONTRIBUTING.md for guidelines.

TL;DR

  1. Fork.
  2. pre-commit install.
  3. Commit & push on a feature branch.
  4. Create a PR πŸš€

πŸ”— References & Credits


πŸ“„ License

MIT Β© 2024 Build Fast with AI and contributors.