This repository contains source code for the paper Show, Don't Tell: Aligning Language Models with Demonstrated Feedback by Omar Shaikh, Michelle Lam, Joey Hejna, Yijia Shao, Michael Bernstein, and Diyi Yang. Feel free to reach out to Omar Shaikh with any questions!
Language models are aligned to emulate the collective voice of many, resulting in outputs that align with no one in particular. Steering LLMs away from generic output is possible through supervised finetuning or RLHF, but requires prohibitively large datasets for new ad-hoc tasks. We argue that it is instead possible to align an LLM to a specific setting by leveraging a very small number (
We build on alignment-handbook repo. Here are the steps to get set up!
First, create a Python virtual environment using e.g. Conda:
conda create -n ditto python=3.10 && conda activate ditto
Next, install PyTorch v2.1.2
. We used the following.
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia
Finally, install the alignment handbook dependencies.
git clone https://github.com/huggingface/alignment-handbook.git
cd ./alignment-handbook/
python -m pip install .
A sample shell script with training + generation is in run.sh (trains Mistral Instruct v0.2 7B). Right now, it's set to finetune on email examples. The shell script has an argument for trying different datasets in the paper. Note that you may need to change the config files for your specific hardware or dataset.
bash run.sh
Feel free to use the following BibTeX entry.
BibTeX:
@misc{shaikh2024show,
title={Show, Don't Tell: Aligning Language Models with Demonstrated Feedback},
author={Omar Shaikh and Michelle Lam and Joey Hejna and Yijia Shao and Michael Bernstein and Diyi Yang},
year={2024},
eprint={2406.00888},
archivePrefix={arXiv},
primaryClass={cs.CL}
}