A powerful Retrieval-Augmented Generation (RAG) system combining Colpali's ColQwen image embeddings with LLaMA Vision via Ollama.
- 🧬 ColQwen model for generating powerful image embeddings via Colpali
- 🤖 LLaMA Vision integration through Ollama for image understanding
- 📥 Intelligent image indexing with duplicate detection
- 💬 Natural language image queries
- 📄 PDF document support
- 🔍 Semantic similarity search
- 📊 Efficient SQLite storage
- Embedding Model: ColQwen via Colpali
- Vision Model: LLaMA Vision via Ollama
- Frontend: Streamlit
- Database: SQLite
- Image Processing: Pillow, pdf2image
- ML Framework: PyTorch
-
Install Poppler (required for PDF support):
Mac:
brew install poppler
Windows:
- Download the latest poppler package from: https://github.com/oschwartz10612/poppler-windows/releases/
- Extract the downloaded zip to a location (e.g.,
C:\Program Files\poppler
) - Add bin directory to PATH:
- Open System Properties > Advanced > Environment Variables
- Under System Variables, find and select "Path"
- Click "Edit" > "New"
- Add the bin path (e.g.,
C:\Program Files\poppler\bin
)
- Verify installation:
pdftoppm -h
-
Clone and setup environment:
git clone https://github.com/kturung/colpali-llama-vision-rag.git python -m venv venv source venv/bin/activate # For Mac/Linux # or .\venv\Scripts\activate # For Windows pip install -r requirements.txt
-
Install Ollama from https://ollama.com
-
Launch application:
streamlit run app.py
Note: Restart your terminal/IDE after modifying PATH variables
- Navigate to "➕ Add to Index"
- Upload images/PDFs
- System automatically:
- Generates ColQwen embeddings
- Checks for duplicates
- Stores in SQLite
- Go to "🔍 Query Index"
- Enter natural language query
- View similar images
- Get LLaMA Vision analysis
CREATE TABLE embeddings (
id INTEGER PRIMARY KEY AUTOINCREMENT,
image_base64 TEXT,
image_hash TEXT UNIQUE,
embedding BLOB
)