In-context Learning Influences

[Paper] [Blog post]

Official implementation for "In-context Example Selection with Influences". We introduce in-context influences as a way to select examples for few-shot in-context learning. Authors: Tai Nguyen and Eric Wong.

News

Todo - Release influence scores for all tasks and code for baselines
04/18/2023 - Repository release
04/06/2023 - Blog post release

Getting started

Create a new conda environment using environment.yml. The env is called "icl-influences" by default.

conda env create -f environment.yml
conda activate icl-influences

Alternatively, feel free to use Dockerfile to build your own Docker image.

Usage

Download data

Directory data-train400-dev200 holds the subsampled data from our paper. We conducted experiments on 9 SuperGLUE tasks.

To redownload these datasets from HuggingFace, please run the following command.

python data_download.py

In addition to downloading, the script automatically samples a specified number of examples for train/dev/test data splits.

Compute in-context influence scores

To compute in-context influences for a specific task and model, we first need to obtain a number of "training runs".

The following script 1) obtains the training runs, and 2) computes influence scores for both influence-based methods discussed in Section 3.1. By default, we write training run results to out/ and influence scores to influence_scores.jsonl.

python icl_influence.py --task=hellaswag \
                        --model_name_or_path=facebook/opt-6.7b \
                        --shot=46 \
                        --iterations=650 \
                        --cache_dir=<HF_CACHE_DIR>

In the above script, note that we pass in:

--shot: The number of examples used in each few-shot prompt
--iterations: The number of training runs evaluated on the Dev set
--cache_dir: (Optional) Directory for caching all models downloaded from HuggingFace

We recommend specifying a maximal number of shots that could fit in the context window. This means that fewer iterations need to be run for good coverage of all train examples.

Evaluate

After influence scores are computed, run evaluation as followed.

python evaluate.py --task=hellaswag \
                   --model_name_or_path=facebook/opt-6.7b \
                   --split=test \
                   --method=incontext_influence_positive \
                   --resource_file=influence_scores.jsonl \ 
                   --cache_dir=<HF_CACHE_DIR>

The script picks a pre-defined k number of examples for each task define in evaluate.SHOT_MAP (same settings as in-context influence computation).

How to add your own data?

Add a method to data_download.py for downloading your own data. Keep the data fields similar to the current datasets.
Add the task type of your newly added task to task_config.json.
1. If the task type is outside of Multi-choice and Binary classification (ie. "free-form" text generation), you should also modify inference and encode methods in utils.py.
2. Alternative to accuracy, you can also define your own evaluation metric by modifying icl_datamodel.py.
Add a new prompt template to templates.py.
Rerun the same pipeline.

Models available

We currently include working pipelines for 4 autoregressive model families: GPT-2, OPT, GPT-J/NeoX, and LLaMA. To save on memory, we load all models with half precision (fp16) wherever possible. For LLaMA, please include the path to your converted weights following the HF's official guide.

Citation

If you find our work helpful, please cite:

@article{nguyen2023incontextinfluences,
  author = Nguyen, Tai and Wong, Eric,
  title = In-context Example Selection with Influences,
  journal = arXiv,
  year = 2023
}

ykwon0407/incontext_influences