This repo contains the code for reproducing the experiments on WikiText-103 from AI21 Labs' paper In-Context Retrieval-Augmented Language Models (In-Context RALM), to appear in the Transactions of the Association for Computational Linguistics (TACL).
Our code is mainly based on the Transformers and Pyserini libraries.
We test it on Python 3.8.
To install the required libraries in our repo, run:
pip install -r requirements.txt
To have a Pytorch version specific to your CUDA, install your version before running the above command.
Our BM25 preparation script works with Pyserini, so Java 11 is required - see their installation guide.
If you have Java 11 installed, make sure your JAVA_HOME
environment variable is set to the correct path.
On a Linux system, the correct path might look something like /usr/lib/jvm/java-11
.
Then run:
python prepare_retrieval_data.py \
--retrieval_type sparse \
--tokenizer_name $MODEL_NAME \
--max_length 1024 \
--dataset_path wikitext \
--dataset_name wikitext-103-v1 \
--dataset_split [validation, test] \
--index_name wikipedia-dpr \
--forbidden_titles_path ralm/retrievers/wikitext103_forbidden_titles.txt \
--stride 4 \
--output_file $RETRIEVAL_FILE \
--num_tokens_for_query 32 \
--num_docs 16
In the paper, we give the results for the following models (replace $MODEL_NAME
with one of those).
Note that the larger models may need model parallelism (on a 40GB A100, we used model parallelism for OPT-30B and OPT-66B).
See details below on how to apply this option.
- GPT-2:
gpt2
,gpt2-medium
,gpt2-large
,gpt2-xl
- GPT-Neo:
EleutherAI/gpt-neo-1.3B
,EleutherAI/gpt-neo-2.7B
,EleutherAI/gpt-j-6B
- OPT:
facebook/opt-125m
,facebook/opt-350m
,facebook/opt-1.3b
,facebook/opt-2.7b
,facebook/opt-6.7b
,facebook/opt-13b
,facebook/opt-30b
,facebook/opt-66b
To run evaluation on models without retrieval, please use the following command (you can increase stride
to 32 for faster evaluation):
python eval_lm.py \
--model_name $MODEL_NAME \
--dataset_path wikitext \
--dataset_name wikitext-103-v1 \
--dataset_split [validation, test] \
--output_dir $OUTPUT_DIR \
--stride 4 \
--max_length 1024 \
[--model_parallelism]
To run models with retrieval, use the $RETRIEVAL_FILE
output from the prepare_retrieval_data.py
script:
python eval_lm.py \
--model_name $MODEL_NAME \
--dataset_path wikitext \
--dataset_name wikitext-103-v1 \
--dataset_split [validation, test] \
--output_dir $OUTPUT_DIR \
--stride 4 \
--max_length 1024 \
[--model_parallelism] \
--retrieved_file $RETRIEVAL_FILE
Note: Our main retrieval flow assumes you want to use the top-scored passage from your retrieval file (--ranking_strategy first
).
Currently, we support logprob
(the zero-shot method described in subsection 6.1) and oracle
(to understand the potential gains from reranking).
For reranking, first you need to make sure you run the retrieval script with num_docs=16
(or any other number you want to rerank on).
If you enable multiple GPUs, data parallelism will automatically be applied (each GPU will get different retrieved documents to condition on).
Then run:
python eval_lm.py \
--model_name $MODEL_NAME \
--dataset_path wikitext \
--dataset_name wikitext-103-v1 \
--dataset_split [validation, test] \
--output_dir $OUTPUT_DIR \
--stride 4 \
--max_length 1024 \
[--model_parallelism] \
--retrieved_file $RETRIEVAL_FILE \
--ranking_strategy [logprob, oracle] \
--num_docs_to_rank 16 \
--ranking_logprob_past_tokens 16
To run our QA experiments on Natural Questions, start by downloading the datasets augmented by DPR results:
wget https://dl.fbaipublicfiles.com/dpr/data/retriever_results/single/nq-test.json.gz
gzip -d ./nq-test.json.gz
Then run the evaluation script:
python eval_qa.py \
--model_name $MODEL_NAME \
--dataset_path ./nq-test.json \
--output_dir $OUTPUT_DIR \
--num_docs [0,1,2] \
[--model_parallelism]
where num_docs
is the number of retrieved documents to include in-context (num_docs=0
is the closed-book setting, num_docs>=1
is open-book setting.)
If you find our paper or code helpful, please cite our paper:
@article{ram2023ralm,
author = {Ori Ram and Yoav Levine and Itay Dalmedigos and Dor Muhlgay and Amnon Shashua and Kevin Leyton-Brown and Yoav Shoham},
title = {In-Context Retrieval-Augmented Language Models},
year = {2023},
url = {https://arxiv.org/abs/2302.00083},
journal = {Transactions of the Association for Computational Linguistics}
}