ControlRetriever: Harnessing the Power of Instructions for Controllable Retrieval

This repo contains the PyTorch implementation of ControlRetriever: Harnessing the Power of Instructions for Controllable Retrieval.

News

🤩 The upgrade version of ControlRetriever —— I3: Intent-Introspective Retrieval Conditioned on Instructions has been accepted by SIGIR 2024.

Installation

This repos is built based on beir and pygaggle. Please refer to the corresponding repos for the installation of the Python environment.

Prepare

1. Download BEIR Data

Please first follow the instructions to download BEIR data. Then modify the BEIR_DIR in scripts/evaluate_retrieval.sh and scripts/evaluate_rerank.sh to the folder that contains the BEIR data. Finally, put data/instructions.jsonl into the folder that contains the BEIR data.

2. Prepare Model Checkpoints

Download the pretrained checkpoints of cocodr-large. Modify the MODEL_NAME in scripts/evaluate_retrieval.sh and scripts/evaluate_rerank.sh to the folder that contains cocodr-large weight.
Download the pretrained checkpoints of monot5-3b-msmarco-10k. Modify the corresponding context in rerank_util.py to the folder that contains monot5-3b-msmarco-10k weight.
Download the checkpoints of ControlRetriever from Here and put model.ckpt into the checkpoint folder.

Retrieval & Rerank

To leverage ControlRetriever for zero-shot retrieval & rerank, you can refer to the scripts provided at scripts/evaluate_retrieval.sh and scripts/evaluate_rerank.sh.

Acknowledgment

Our project is developed based on the following repositories:

beir: a heterogeneous benchmark for information retrieval.
pygaggle: a gaggle of deep neural architectures for text ranking and question answering, designed for Pyserini.

Citation

If you found this work useful, please consider citing our paper as follows:

@misc{pan2024i3,
      title={I3: Intent-Introspective Retrieval Conditioned on Instructions}, 
      author={Kaihang Pan and Juncheng Li and Wenjie Wang and Hao Fei and Hongye Song and Wei Ji and Jun Lin and Xiaozhong Liu and Tat-Seng Chua and Siliang Tang},
      year={2024},
      eprint={2308.10025},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}