This project provides a tool for evaluating various embedding models and search strategies using a Streamlit web application. It allows users to upload a corpus, select multiple embedding models, and compare different search strategies for information retrieval tasks.
- Upload custom corpus (TXT file)
- Support for multiple embedding models:
- TensorFlow Serving models (via TensorFlow Serving API)
- Sentence Transformer models (via Sentence Transformers library)
- Evaluate various search strategies:
- Exact Match
- Prefix Match
- Fuzzy Match
- BM25 Retrieval
- Semantic Search (using selected embedding models)
- Interactive weight adjustment for each search strategy
- Real-time visualization of search results and execution times
- On-demand downloading of additional Sentence Transformer models
- Docker
- Docker Compose
- Clone this repository
- Navigate to the project directory
- Run the following command:
docker compose up --build
- Open your browser and go to http://localhost:8502 to access the Streamlit application
To develop while running the application, you can run instead:
docker compose watch
- docker-compose.yml: Defines the services for TensorFlow Serving and the Streamlit application
- Dockerfile.streamlit: Dockerfile for building the Streamlit application image
- src/app.py: Main Streamlit application code
- models/: Directory for storing downloaded Sentence Transformer models
- Upload a corpus (TXT file) using the file uploader
- Select the embedding models you want to evaluate
- Enter a search query
- Choose the search strategies to use and adjust their weights
- View the search results, including scores for each strategy and overall weighted scores
- Analyze the execution time chart for performance comparison
- To add new TensorFlow Serving models, update the
tf_serving
service indocker-compose.yml
- To use different Sentence Transformer models, download them using the provided interface in the Streamlit app
This is a personal project that helped me quickly evaluating embedding models and search strategies. If you find it useful, feel free to use for your own purposes. While contributions are welcome, please note that this project is just a scrappy tool and not a production-ready solution.
This project is open-source and available under the MIT License.