Talk to PDF - Binary Quantization RAG Application

A sophisticated Streamlit application that allows users to upload PDF files and interact with them using natural language queries, powered by binary quantization for efficient vector storage and retrieval.

🌟 Features

📄 Multi-PDF Upload: Upload one or multiple PDF files simultaneously
🔧 Binary Quantization: Efficient embedding storage using binary quantization
💬 Interactive Chat: Natural language conversation with your PDFs
⏱️ Response Time Tracking: Real-time performance metrics in milliseconds
📋 PDF Preview: File details including page count and size
🗂️ Vector Database: Milvus-powered semantic search
🤖 Advanced LLM: Groq integration for fast response generation

🛠️ Technology Stack

Frontend: Streamlit
Embeddings: OpenAI text-embedding-3-small
Vector Database: Milvus with HAMMING distance
LLM: Groq (moonshotai/kimi-k2-instruct)
PDF Processing: PyPDF2
Binary Quantization: NumPy-based optimization

📁 Project Structure

boost-rag-with-binary-quantization/
├── streamlit_main.py          # Main Streamlit application
├── embedding.py               # Binary quantization embedding logic
├── retriever_llm_index.py     # Retrieval and LLM integration
├── requirements.txt           # Python dependencies
├── run_app.sh                # Application launcher script
├── .env.example              # Environment variables template
├── docker-compose.yml        # Docker configuration
└── docs/                     # PDF documents directory
    └── llm.pdf

🚀 Quick Start

1. Environment Setup

# Clone or navigate to the project directory
cd boost-rag-with-binary-quantization

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Environment Variables

Create a .env file based on .env.example:

cp .env.example .env

Edit .env and add your API keys:

OPENAI_API_KEY=your_openai_api_key_here
GROQ_API_KEY=your_groq_api_key_here

3. Run the Application

Option A: Using the run script

./run_app.sh

Option B: Direct Streamlit command

streamlit run streamlit_main.py

The application will open in your browser at http://localhost:8501

📖 How to Use

Step 1: Upload PDFs

Use the sidebar to upload one or multiple PDF files
View file details in the PDF Preview section
See the number of text chunks extracted from each file

Step 2: Create Embeddings

Click the "🔧 Create Embeddings" button in the sidebar
Wait for the binary quantization process to complete
The system will create a Milvus vector database with your content

Step 3: Chat with Your PDFs

Use the chat interface in the main area
Ask questions about your uploaded PDFs
View response times for each interaction
Clear chat history when needed

🔧 Technical Details

Binary Quantization Process

Text Extraction: PDFs are processed and split into chunks
Float32 Embeddings: Generated using OpenAI's text-embedding-3-small
Binary Conversion: Float values > 0 become 1, others become 0
Byte Packing: Binary vectors are packed into bytes for storage
Milvus Storage: Stored with HAMMING distance indexing

Performance Benefits

Storage Efficiency: 32x reduction in storage space
Query Speed: Faster similarity search with binary operations
Memory Usage: Significantly reduced RAM requirements
Scalability: Better performance with large document collections

🎛️ Configuration Options

Embedding Model

embedding_model = OpenAIEmbedding(model="text-embedding-3-small")

LLM Configuration

llm = Groq(
    model="moonshotai/kimi-k2-instruct",
    api_key=os.environ.get("GROQ_API_KEY"),
    temperature=0.5,
    max_tokens=1000
)

Vector Search Parameters

search_params = {"metric_type": "HAMMING"}
limit = 5  # Number of retrieved documents

🐳 Docker Support

Run with Docker Compose:

docker-compose up -d

📊 Performance Monitoring

The application tracks and displays:

Response Time: LLM generation time in milliseconds
Embedding Creation: Progress and completion status
File Processing: Upload and parsing status
Vector Search: Retrieval performance

🔍 Troubleshooting

Common Issues

Missing API Keys
- Ensure .env file exists with valid API keys
- Check OpenAI and Groq API key formats
PDF Processing Errors
- Verify PDF files are not corrupted
- Check file size limitations
- Ensure text-extractable PDFs (not image-only)
Vector Database Issues
- Delete milvus_data.db and recreate embeddings
- Check disk space availability
- Verify Milvus dependencies
Performance Issues
- Reduce number of documents retrieved (limit parameter)
- Use smaller PDF files for testing
- Monitor system memory usage

🔄 Development

Code Structure

streamlit_main.py: Main application with UI components
Binary quantization functions: Embedding conversion logic
Vector store management: Milvus collection handling
Chat interface: Message history and response generation

Key Functions

extract_text_from_pdf(): PDF text extraction
create_binary_embeddings(): Embedding quantization
setup_vector_store(): Milvus database setup
retrieve_context(): Semantic search
generate_response(): LLM interaction

📝 API Keys Required

OpenAI API Key: For text embeddings
- Get from: https://platform.openai.com/api-keys
- Used for: text-embedding-3-small model
Groq API Key: For LLM inference
- Get from: https://console.groq.com/keys
- Used for: moonshotai/kimi-k2-instruct model

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

📄 License

This project is open source. Please check the license file for details.

🆘 Support

For issues and questions:

Check the troubleshooting section
Review the code documentation
Open an issue on the repository

Happy Chatting with your PDFs! 🎉

dwickyfp/faster-rag-with-binary-quantization