🧠 AutoDocThinker: Agentic RAG System with Intelligent Search Engine

demo.mp4

🎯 Project Overview

The Agentic RAG System is an AI-powered document intelligence platform that enables users to extract insights from uploaded files (PDFs, Word docs, text) or web URLs through natural language queries. Built with Python/Flask and LangChain, the system uses a multi-agent workflow to intelligently process documents, retrieve relevant information from a vector database (ChromaDB), and generate human-like answers—seamlessly falling back to Wikipedia when needed. The responsive web interface (HTML/CSS/Bootstrap) allows users to ask questions conversationally, while the modular backend demonstrates robust error handling, logging, and secure file processing.

🚀 Live Demo

🖥️ Try it now: AutoDocThinker: Agentic RAG System with Intelligent Search Engine

⚙️ Features & Functionalities

#	Module	Technology Stack	Your Implementation Details
1	LLM Processing	Groq + LLaMA-3-70B	Configured with optimal temperature (0.2) and token limits
2	Document Parsing	PyMuPDF + python-docx	Handled PDF, DOCX, TXT with metadata preservation
3	Text Chunking	RecursiveCharacterTextSplitter	500-character chunks with 20% overlap for context
4	Vector Embeddings	all-MiniLM-L6-v2	Efficient 384-dimensional embeddings
5	Vector Database	ChromaDB	Local persistent storage with cosine similarity
6	Agent Workflow	LangGraph	7 specialized nodes with conditional routing
7	Planner Agent	LangGraph Planner Node	Generates execution plans
8	Executor Agent	LangGraph Node	Orchestrates tool calls
9	Web Fallback	Wikipedia API	Auto-triggered when document confidence < threshold
10	Memory System	deque(maxlen=3)	Maintained conversation history buffer
11	User Interface	HTML, CSS, Bootstrap, JS	Interactive web app with file, URL, Text upload
12	Containerization	Docker	Portable deployment
13	CI/CD Pipeline	GitHub Actions	Automated linting/testing

🧱 Project Structure

AutoDocThinker/
├── .github/
│ └── workflows/
│     └── main.yml
│  
├── agents/
│ ├── init.py
│ ├── document_processor.py
│ └── orchestration.py
│  
├── data/
│ └── sample.pdf
│  
├── notebooks/
│ └── experiment.ipynb
│  
├── static/
│ ├── css/
│ │ └── style.css
│ └── js/
│   └── script.js
│  
├── templates/
│ └── index.html
│  
├── tests/
│ └── test_app.py
│  
├── uploads/
│  
├── vector_db/
│ └── chroma_collection/
│   └── chroma.sqlite3
│
├── app.log
├── app.py
├── demo.mp4
├── demo.png
├── Dockerfile
├── LICENSE
├── render.yaml
├── README.md
├── requirements.txt
└── setup.py

🧱 System Architecture

%% Agentic RAG System Architecture - Colorful Version
graph TD
    A[User Interface]:::ui -->|Upload/Input| B[Flask Web Server]:::server
    B --> C[Tool Router Agent]:::router
    C -->|File| D[Document Processor]:::processor
    C -->|URL| E[Web Scraper]:::scraper
    C -->|Text| F[Text Preprocessor]:::preprocessor
    
    D --> G[PDF/DOCX/TXT Parser]:::parser
    E --> H[URL Content Extractor]:::extractor
    F --> I[Text Chunker]:::chunker
    
    G --> J[Chunking & Embedding]:::embedding
    H --> J
    I --> J
    
    J --> K[Vector Database]:::database
    
    B -->|Query| L[Planner Agent]:::planner
    L -->|Has Documents| M[Retriever Agent]:::retriever
    L -->|No Documents| N[Fallback Agent]:::fallback
    
    M --> K
    K --> O[LLM Answer Agent]:::llm
    N --> P[Wikipedia API]:::api
    P --> O
    
    O --> Q[Response Formatter]:::formatter
    Q --> B
    B --> A

    classDef ui fill:#4e79a7,color:white,stroke:#333;
    classDef server fill:#f28e2b,color:white,stroke:#333;
    classDef router fill:#e15759,color:white,stroke:#333;
    classDef processor fill:#76b7b2,color:white,stroke:#333;
    classDef scraper fill:#59a14f,color:white,stroke:#333;
    classDef preprocessor fill:#edc948,color:#333,stroke:#333;
    classDef parser fill:#b07aa1,color:white,stroke:#333;
    classDef extractor fill:#ff9da7,color:#333,stroke:#333;
    classDef chunker fill:#9c755f,color:white,stroke:#333;
    classDef embedding fill:#bab0ac,color:#333,stroke:#333;
    classDef database fill:#8cd17d,color:#333,stroke:#333;
    classDef planner fill:#499894,color:white,stroke:#333;
    classDef retriever fill:#86bcb6,color:#333,stroke:#333;
    classDef fallback fill:#f1ce63,color:#333,stroke:#333;
    classDef llm fill:#d37295,color:white,stroke:#333;
    classDef api fill:#a0d6e5,color:#333,stroke:#333;
    classDef formatter fill:#b3b3b3,color:#333,stroke:#333;

🌍 Real-World Applications

Corporate HR Automation
Legal Document Review
Academic Research
Customer Support
Healthcare Compliance
Financial Analysis
Media Monitoring
Education
Technical Documentation
Government Transparency

📥 Installation

# 1. Clone the repository
git clone https://github.com/Md-Emon-Hasan/AutoDocThinker.git
cd AutoDocThinker

# 2. Install dependencies
pip install -r requirements.txt

Or with Docker:

# Build Docker Image
docker build -t auto-doc-thinker .

# Run the container
docker run -p 8501:8501 auto-doc-thinker

🔁 GitHub Actions CI/CD

.github/workflows/main.yml

name: CI

on:
  push:
    branches: [ master ]
  pull_request:
    branches: [ master ]

jobs:
  build-and-test:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: "3.11"
      - name: Install dependencies
        run: |
          pip install -r requirements.txt
      - name: Lint with flake8
        run: |
          pip install flake8
          flake8 .

📝 Future Enhancements

✅ Multilingual document ingestion
✅ Audio document ingestion + whisper
⏳ Long-term memory + history viewer
⏳ MongoDB/FAISS alternative for Chroma
✅ More tools (WolframAlpha, SerpAPI)
⏳ Model selection dropdown (Gemini, LLaMA, GPT-4)

👨‍💻 Author

Md Emon Hasan 📧 Email: email 🔗 LinkedIn: md-emon-hasan 🔗 GitHub: Md-Emon-Hasan 🔗 Facebook: mdemon.hasan2001/ 🔗 WhatsApp: 8801834363533