/faster-rag-with-binary-quantization

🚀 Supercharge your RAG system with binary quantization - Achieve 10x faster retrieval and 32x storage reduction. Complete implementation with Milvus, OpenAI embeddings, and practical examples.

Primary LanguagePython

Talk to PDF - Binary Quantization RAG Application

A sophisticated Streamlit application that allows users to upload PDF files and interact with them using natural language queries, powered by binary quantization for efficient vector storage and retrieval.

🌟 Features

  • 📄 Multi-PDF Upload: Upload one or multiple PDF files simultaneously
  • 🔧 Binary Quantization: Efficient embedding storage using binary quantization
  • 💬 Interactive Chat: Natural language conversation with your PDFs
  • ⏱️ Response Time Tracking: Real-time performance metrics in milliseconds
  • 📋 PDF Preview: File details including page count and size
  • 🗂️ Vector Database: Milvus-powered semantic search
  • 🤖 Advanced LLM: Groq integration for fast response generation

🛠️ Technology Stack

  • Frontend: Streamlit
  • Embeddings: OpenAI text-embedding-3-small
  • Vector Database: Milvus with HAMMING distance
  • LLM: Groq (moonshotai/kimi-k2-instruct)
  • PDF Processing: PyPDF2
  • Binary Quantization: NumPy-based optimization

📁 Project Structure

boost-rag-with-binary-quantization/
├── streamlit_main.py          # Main Streamlit application
├── embedding.py               # Binary quantization embedding logic
├── retriever_llm_index.py     # Retrieval and LLM integration
├── requirements.txt           # Python dependencies
├── run_app.sh                # Application launcher script
├── .env.example              # Environment variables template
├── docker-compose.yml        # Docker configuration
└── docs/                     # PDF documents directory
    └── llm.pdf

🚀 Quick Start

1. Environment Setup

# Clone or navigate to the project directory
cd boost-rag-with-binary-quantization

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Environment Variables

Create a .env file based on .env.example:

cp .env.example .env

Edit .env and add your API keys:

OPENAI_API_KEY=your_openai_api_key_here
GROQ_API_KEY=your_groq_api_key_here

3. Run the Application

Option A: Using the run script

./run_app.sh

Option B: Direct Streamlit command

streamlit run streamlit_main.py

The application will open in your browser at http://localhost:8501

📖 How to Use

Step 1: Upload PDFs

  1. Use the sidebar to upload one or multiple PDF files
  2. View file details in the PDF Preview section
  3. See the number of text chunks extracted from each file

Step 2: Create Embeddings

  1. Click the "🔧 Create Embeddings" button in the sidebar
  2. Wait for the binary quantization process to complete
  3. The system will create a Milvus vector database with your content

Step 3: Chat with Your PDFs

  1. Use the chat interface in the main area
  2. Ask questions about your uploaded PDFs
  3. View response times for each interaction
  4. Clear chat history when needed

🔧 Technical Details

Binary Quantization Process

  1. Text Extraction: PDFs are processed and split into chunks
  2. Float32 Embeddings: Generated using OpenAI's text-embedding-3-small
  3. Binary Conversion: Float values > 0 become 1, others become 0
  4. Byte Packing: Binary vectors are packed into bytes for storage
  5. Milvus Storage: Stored with HAMMING distance indexing

Performance Benefits

  • Storage Efficiency: 32x reduction in storage space
  • Query Speed: Faster similarity search with binary operations
  • Memory Usage: Significantly reduced RAM requirements
  • Scalability: Better performance with large document collections

🎛️ Configuration Options

Embedding Model

embedding_model = OpenAIEmbedding(model="text-embedding-3-small")

LLM Configuration

llm = Groq(
    model="moonshotai/kimi-k2-instruct",
    api_key=os.environ.get("GROQ_API_KEY"),
    temperature=0.5,
    max_tokens=1000
)

Vector Search Parameters

search_params = {"metric_type": "HAMMING"}
limit = 5  # Number of retrieved documents

🐳 Docker Support

Run with Docker Compose:

docker-compose up -d

📊 Performance Monitoring

The application tracks and displays:

  • Response Time: LLM generation time in milliseconds
  • Embedding Creation: Progress and completion status
  • File Processing: Upload and parsing status
  • Vector Search: Retrieval performance

🔍 Troubleshooting

Common Issues

  1. Missing API Keys

    • Ensure .env file exists with valid API keys
    • Check OpenAI and Groq API key formats
  2. PDF Processing Errors

    • Verify PDF files are not corrupted
    • Check file size limitations
    • Ensure text-extractable PDFs (not image-only)
  3. Vector Database Issues

    • Delete milvus_data.db and recreate embeddings
    • Check disk space availability
    • Verify Milvus dependencies
  4. Performance Issues

    • Reduce number of documents retrieved (limit parameter)
    • Use smaller PDF files for testing
    • Monitor system memory usage

🔄 Development

Code Structure

  • streamlit_main.py: Main application with UI components
  • Binary quantization functions: Embedding conversion logic
  • Vector store management: Milvus collection handling
  • Chat interface: Message history and response generation

Key Functions

  • extract_text_from_pdf(): PDF text extraction
  • create_binary_embeddings(): Embedding quantization
  • setup_vector_store(): Milvus database setup
  • retrieve_context(): Semantic search
  • generate_response(): LLM interaction

📝 API Keys Required

  1. OpenAI API Key: For text embeddings

  2. Groq API Key: For LLM inference

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

📄 License

This project is open source. Please check the license file for details.

🆘 Support

For issues and questions:

  1. Check the troubleshooting section
  2. Review the code documentation
  3. Open an issue on the repository

Happy Chatting with your PDFs! 🎉