Multi-stage Retrieval using RM3 or SPLADE and T5

An end-to-end Search Engine that can index documents for two-stage retrieval. The system focuses on a multi-stage retrieval architecture with query expansion using SPLADE or RM3 and BM25 for retrieval, and the T5 text-to-text transformer for re-ranking. The proposed framework was evaluated on the Complex Document and Entity Collection (CODEC), which consists of a corpus of social science domains across History, Economics and Politics. CODEC also defines a document ranking and an entity ranking task which align with each other to improve document ranking through entity query expansion and topic modelling.

Getting Started

Fork (Optional) and clone the repository.

git clone --recurse-submodules https://github.com/<username>/multi-stage-retrieval-using-splade-and-t5

Initialise a virtual environment (e.g. venv) and install pre-requisites.

# create a new env (from the repo root)
python3 -m venv venv

# activate env for unix/linux
source venv/bin/activate    

# activate env for windows
./source/Scripts/activate

# install pre-requisites
pip install -r requirements.txt

Try the Search Engine

Make sure you download the whole corpus and save it as CODEC/corpus/codec_documents.jsonl.
Start the API Server (Will automatically build the index, or load pre-built index).

python app.py

Open the localhost:8000 URL in the browser. Select and Model you want to use and type the query in the search box.

Try the Experiments

Make sure you download the whole corpus and save it as CODEC/corpus/codec_documents.jsonl.
Run the following script to show all experiment results and save results to ./results/metrics.csv.

python eval.py

mihirs16/multi-stage-retrieval-using-rm3-and-t5

Multi-stage Retrieval using RM3 or SPLADE and T5

Getting Started

Try the Search Engine

Try the Experiments