caiyinqiong/Awesome-Information-Retrieval-in-the-Age-of-Large-Language-Model

A curated list of awesome papers about information retrieval(IR) in the age of large language model(LLM). These include retrieval augmented large language model, large language model for information retrieval, and so on.

Awesome Information Retrieval in the Age of Large Language Model

A curated list of awesome papers about information retrieval(IR) in the age of large language model(LLM). These include retrieval augmented large language model, large language model for information retrieval, and so on. If I missed any papers, feel free to open a PR to include them! And any feedback and contributions are welcome!

This list is currently maintained by Yinqiong Cai,Yu-An Liu, and Shiyu Nee, at CAS Key Lab of Network Data Science and Technology, ICT, CAS.

We thank all the great contributors very much.

Contents

Retrieval Augmented LLM
LLM for IR
Benchmark and Evaluation
Toolkits

Retrieval Augmented LLM

Pre-training Stage

REALM: Retrieval augmented language model pre-training. Kelvin Guu et.al. ICML 2020.
Atlas: Few-shot Learning with Retrieval Augmented Language Models. Gautier Izacard et.al. Arxiv 2022.
Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study. Boxin Wang et.al Arxiv 2023.

Fine-tuning Stage

RAG: Retrieval-augmented generation for knowledge-intensive NLP tasks. Patrick Lewis et.al. NeurIPS 2020.
FiD：Leveraging passage retrieval with generative models for open domain question answering. Gautier Izacard, Edouard Grave EACL 2021.

Inference Stage

Generalization through memorization: Nearest neighbor language models. Urvashi Khandelwal et.al. Arxiv 2019.
Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. Harsh Trivedi et.al. Arxiv 2022.
Rethinking with retrieval: Faithful large language model inference. Hangfeng He et.al. Arxiv 2023.
REPLUG: Retrieval-Augmented Black-Box Language Models. Weijia Shi et.al. Arxiv 2023.

LLM for IR

Generating Synthetic Queries

InPars: Data augmentation for information retrieval using large language models Luiz Bonifacio et al. SIGIR 2022.
UPR: Improving passage retrieval with zero-shot question generation Devendra Singh Sachan et al. EMNLP 2022.
Promptagator: Fewshot dense retrieval from 8 examples Zhuyun Dai et al. ICLR 2023.

Generating Synthetic Documents

Precise Zero-Shot Dense Retrieval without Relevance Labels Luyu Gao et.al. Arxiv 2022.
Generating Synthetic Documents for Cross-Encoder Re-Rankers: A Comparative Study of ChatGPT and Human Experts Arian Askari et.al. Arxiv 2023.

Generating Ranking Lists

Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent Weiwei Sun et al. Arxiv 2023.
Zero-Shot Listwise Document Reranking with a Large Language Model Xueguang Ma et al. Arxiv 2023.

Generate rather than Retrieve

Generate rather than retrieve: Large language models are strong context generators Wenhao Yu et al. ICLR 2023.

Benchmark and Evaluation

KILT: a benchmark for knowledge intensive language tasks Fabio Petroni et.al. NAACL 2021.

Toolkits

RETA-LLM: A Retrieval-Augmented Large Language Model Toolkit Jiongnan Liu et al. Arxiv 2023. RETA-LLM