A flexible full-stack platform for hybrid semantic search combining vector similarity and keyword matching, with support for custom metadata schemas.
-
Hybrid Search Algorithm
- Vector similarity using OpenAI embeddings
- Keyword matching with BM25 ranking
- Configurable weight distribution
- Semantic reranking via Cohere
-
Interactive Web Interface
- Real-time search configuration
- Detailed analytics dashboard
- Dynamic result visualization
- Configurable result display
-
Robust Backend
- Flask-based REST API
- Modular connector architecture
- Structured error handling
- Comprehensive logging
- Python 3.12+
- Pinecone account
- OpenAI API key
- Cohere API key
- Clone the repository:
git clone https://github.com/yourusername/hybrid-search.git
cd hybrid-search
- Create and activate a conda environment:
conda env create -f config/environment.yaml
conda activate semantic
- Set up environment variables:
cp .env.example .env
Edit .env
with your API keys:
OPENAI_API_KEY=your_openai_key
PINECONE_API_KEY=your_pinecone_key
PINECONE_INDEX_NAME=your_index_name
DATA_CSV_PATH=path/to/your/data.csv
Start the Flask server:
python src/app.py
This project is actively being developed with a focus on making metadata schemas more pluggable and extensible. Future updates will include:
- Schema discovery and auto-registration
- Additional metadata validators
- Enhanced schema documentation
- More example implementations
- Schema migration tools
This project relies on the following third-party services:
- OpenAI - Text embedding generation
- Cohere - Semantic reranking
- Pinecone - Vector database and similarity search
- Flask - Web framework
- Tailwind CSS - UI styling
This project is licensed under the MIT License. See the LICENSE file for details.