This repo contains the PyTorch implementation of ControlRetriever: Harnessing the Power of Instructions for Controllable Retrieval.
🤩 The upgrade version of ControlRetriever —— I3: Intent-Introspective Retrieval Conditioned on Instructions has been accepted by SIGIR 2024.
This repos is built based on beir and pygaggle. Please refer to the corresponding repos for the installation of the Python environment.
1. Download BEIR Data
Please first follow the instructions to download BEIR data. Then modify the BEIR_DIR
in scripts/evaluate_retrieval.sh and scripts/evaluate_rerank.sh to the folder that contains the BEIR data. Finally, put data/instructions.jsonl into the folder that contains the BEIR data.
2. Prepare Model Checkpoints
- Download the pretrained checkpoints of cocodr-large. Modify the
MODEL_NAME
in scripts/evaluate_retrieval.sh and scripts/evaluate_rerank.sh to the folder that contains cocodr-large weight. - Download the pretrained checkpoints of monot5-3b-msmarco-10k. Modify the corresponding context in rerank_util.py to the folder that contains monot5-3b-msmarco-10k weight.
- Download the checkpoints of ControlRetriever from Here and put
model.ckpt
into thecheckpoint
folder.
To leverage ControlRetriever for zero-shot retrieval & rerank, you can refer to the scripts provided at scripts/evaluate_retrieval.sh and scripts/evaluate_rerank.sh.
Our project is developed based on the following repositories:
- beir: a heterogeneous benchmark for information retrieval.
- pygaggle: a gaggle of deep neural architectures for text ranking and question answering, designed for Pyserini.
If you found this work useful, please consider citing our paper as follows:
@misc{pan2024i3,
title={I3: Intent-Introspective Retrieval Conditioned on Instructions},
author={Kaihang Pan and Juncheng Li and Wenjie Wang and Hao Fei and Hongye Song and Wei Ji and Jun Lin and Xiaozhong Liu and Tat-Seng Chua and Siliang Tang},
year={2024},
eprint={2308.10025},
archivePrefix={arXiv},
primaryClass={cs.CL}
}