Embedding Model and Search Strategy Evaluator

This project provides a tool for evaluating various embedding models and search strategies using a Streamlit web application. It allows users to upload a corpus, select multiple embedding models, and compare different search strategies for information retrieval tasks.

Features

Upload custom corpus (TXT file)
Support for multiple embedding models:
- TensorFlow Serving models (via TensorFlow Serving API)
- Sentence Transformer models (via Sentence Transformers library)
Evaluate various search strategies:
- Exact Match
- Prefix Match
- Fuzzy Match
- BM25 Retrieval
- Semantic Search (using selected embedding models)
Interactive weight adjustment for each search strategy
Real-time visualization of search results and execution times
On-demand downloading of additional Sentence Transformer models

Getting Started

Prerequisites

Docker
Docker Compose

Running the Application

Clone this repository
Navigate to the project directory
Run the following command:

docker compose up --build

Open your browser and go to http://localhost:8502 to access the Streamlit application

Developing with Docker Watch

To develop while running the application, you can run instead:

docker compose watch

Project Structure

docker-compose.yml: Defines the services for TensorFlow Serving and the Streamlit application
Dockerfile.streamlit: Dockerfile for building the Streamlit application image
src/app.py: Main Streamlit application code
models/: Directory for storing downloaded Sentence Transformer models

Usage

Upload a corpus (TXT file) using the file uploader
Select the embedding models you want to evaluate
Enter a search query
Choose the search strategies to use and adjust their weights
View the search results, including scores for each strategy and overall weighted scores
Analyze the execution time chart for performance comparison

Customization

To add new TensorFlow Serving models, update the tf_serving service in docker-compose.yml
To use different Sentence Transformer models, download them using the provided interface in the Streamlit app

Attributions

Contributing

This is a personal project that helped me quickly evaluating embedding models and search strategies. If you find it useful, feel free to use for your own purposes. While contributions are welcome, please note that this project is just a scrappy tool and not a production-ready solution.

License

This project is open-source and available under the MIT License.

srps/model-evaluator