Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models
This is an official repository for our paper, Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models.
@misc{fu2023transformers,
title={Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models},
author={Deqing Fu and Tian-Qi Chen and Robin Jia and Vatsal Sharan},
year={2023},
eprint={2310.17086},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Codes are mostly modified from this prior work.
You can start by cloning our repository and following the steps below.
-
Install the dependencies for our code using Conda. You may need to adjust the environment YAML file depending on your setup.
conda env create -f environment.yml conda activate transformers_icl_opt
-
Download model checkpoints and extract them in the current directory.
wget https://github.com/dtsip/in-context-learning/releases/download/initial/models.zip unzip models.zip
-
Run probing for each Transformers layer
cd src python probing.py
-
Compute Transformer's similarities to both Iterative Newton's Method and Gradient Descent
python eval_similarity.py
This will plot Fig. 1(a) and Fig. 3 in the paper, under a new folder
eval
.