demo.mp4
The Agentic RAG System is an AI-powered document intelligence platform that enables users to extract insights from uploaded files (PDFs, Word docs, text) or web URLs through natural language queries. Built with Python/Flask and LangChain, the system uses a multi-agent workflow to intelligently process documents, retrieve relevant information from a vector database (ChromaDB), and generate human-like answersβseamlessly falling back to Wikipedia when needed. The responsive web interface (HTML/CSS/Bootstrap) allows users to ask questions conversationally, while the modular backend demonstrates robust error handling, logging, and secure file processing.
π₯οΈ Try it now: AutoDocThinker: Agentic RAG System with Intelligent Search Engine
# | Module | Technology Stack | Your Implementation Details |
---|---|---|---|
1 | LLM Processing | Groq + LLaMA-3-70B | Configured with optimal temperature (0.2) and token limits |
2 | Document Parsing | PyMuPDF + python-docx | Handled PDF, DOCX, TXT with metadata preservation |
3 | Text Chunking | RecursiveCharacterTextSplitter | 500-character chunks with 20% overlap for context |
4 | Vector Embeddings | all-MiniLM-L6-v2 | Efficient 384-dimensional embeddings |
5 | Vector Database | ChromaDB | Local persistent storage with cosine similarity |
6 | Agent Workflow | LangGraph | 7 specialized nodes with conditional routing |
7 | Planner Agent | LangGraph Planner Node | Generates execution plans |
8 | Executor Agent | LangGraph Node | Orchestrates tool calls |
9 | Web Fallback | Wikipedia API | Auto-triggered when document confidence < threshold |
10 | Memory System | deque(maxlen=3) | Maintained conversation history buffer |
11 | User Interface | HTML, CSS, Bootstrap, JS | Interactive web app with file, URL, Text upload |
12 | Containerization | Docker | Portable deployment |
13 | CI/CD Pipeline | GitHub Actions | Automated linting/testing |
AutoDocThinker/
βββ .github/
β βββ workflows/
β βββ main.yml
β
βββ agents/
β βββ init.py
β βββ document_processor.py
β βββ orchestration.py
β
βββ data/
β βββ sample.pdf
β
βββ notebooks/
β βββ experiment.ipynb
β
βββ static/
β βββ css/
β β βββ style.css
β βββ js/
β βββ script.js
β
βββ templates/
β βββ index.html
β
βββ tests/
β βββ test_app.py
β
βββ uploads/
β
βββ vector_db/
β βββ chroma_collection/
β βββ chroma.sqlite3
β
βββ app.log
βββ app.py
βββ demo.mp4
βββ demo.png
βββ Dockerfile
βββ LICENSE
βββ render.yaml
βββ README.md
βββ requirements.txt
βββ setup.py
%% Agentic RAG System Architecture - Colorful Version
graph TD
A[User Interface]:::ui -->|Upload/Input| B[Flask Web Server]:::server
B --> C[Tool Router Agent]:::router
C -->|File| D[Document Processor]:::processor
C -->|URL| E[Web Scraper]:::scraper
C -->|Text| F[Text Preprocessor]:::preprocessor
D --> G[PDF/DOCX/TXT Parser]:::parser
E --> H[URL Content Extractor]:::extractor
F --> I[Text Chunker]:::chunker
G --> J[Chunking & Embedding]:::embedding
H --> J
I --> J
J --> K[Vector Database]:::database
B -->|Query| L[Planner Agent]:::planner
L -->|Has Documents| M[Retriever Agent]:::retriever
L -->|No Documents| N[Fallback Agent]:::fallback
M --> K
K --> O[LLM Answer Agent]:::llm
N --> P[Wikipedia API]:::api
P --> O
O --> Q[Response Formatter]:::formatter
Q --> B
B --> A
classDef ui fill:#4e79a7,color:white,stroke:#333;
classDef server fill:#f28e2b,color:white,stroke:#333;
classDef router fill:#e15759,color:white,stroke:#333;
classDef processor fill:#76b7b2,color:white,stroke:#333;
classDef scraper fill:#59a14f,color:white,stroke:#333;
classDef preprocessor fill:#edc948,color:#333,stroke:#333;
classDef parser fill:#b07aa1,color:white,stroke:#333;
classDef extractor fill:#ff9da7,color:#333,stroke:#333;
classDef chunker fill:#9c755f,color:white,stroke:#333;
classDef embedding fill:#bab0ac,color:#333,stroke:#333;
classDef database fill:#8cd17d,color:#333,stroke:#333;
classDef planner fill:#499894,color:white,stroke:#333;
classDef retriever fill:#86bcb6,color:#333,stroke:#333;
classDef fallback fill:#f1ce63,color:#333,stroke:#333;
classDef llm fill:#d37295,color:white,stroke:#333;
classDef api fill:#a0d6e5,color:#333,stroke:#333;
classDef formatter fill:#b3b3b3,color:#333,stroke:#333;
- Corporate HR Automation
- Legal Document Review
- Academic Research
- Customer Support
- Healthcare Compliance
- Financial Analysis
- Media Monitoring
- Education
- Technical Documentation
- Government Transparency
# 1. Clone the repository
git clone https://github.com/Md-Emon-Hasan/AutoDocThinker.git
cd AutoDocThinker
# 2. Install dependencies
pip install -r requirements.txt
Or with Docker:
# Build Docker Image
docker build -t auto-doc-thinker .
# Run the container
docker run -p 8501:8501 auto-doc-thinker
.github/workflows/main.yml
name: CI
on:
push:
branches: [ master ]
pull_request:
branches: [ master ]
jobs:
build-and-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.11"
- name: Install dependencies
run: |
pip install -r requirements.txt
- name: Lint with flake8
run: |
pip install flake8
flake8 .
- β Multilingual document ingestion
- β Audio document ingestion + whisper
- β³ Long-term memory + history viewer
- β³ MongoDB/FAISS alternative for Chroma
- β More tools (WolframAlpha, SerpAPI)
- β³ Model selection dropdown (Gemini, LLaMA, GPT-4)
Md Emon Hasan π§ Email: email π LinkedIn: md-emon-hasan π GitHub: Md-Emon-Hasan π Facebook: mdemon.hasan2001/ π WhatsApp: 8801834363533