Current instruct-following data generally put the task instruction before the input sentence (referred as "Pre-Ins") for sequqnce generation tasks (e.g., machine translation). We observe that LLMs may forget the frontmost task instruction when the input sentence is long, thus we propose to simply place the task instruction after the input sentence (referred as "Post-Ins"). Both our theoretical and experimental analyses show that Post-Ins pays larger attentions on the model's instruction-following capabilities, yielding consistent performance improvements across two common sequence generation tasks. For more details, please refer to our technical report.
Here are self-attention visualizations of models in both data formats, and we find that Pre-Ins mainly foucs on the source input, while Post-Ins pay more attentions on the specific task instruction. Here is an example script to plot the heatmap above.
As shown in above equations, "inp", "inst" and "res" are abbreviations for "source input", "instruction", and "response", respectively. We have observed that the post-instruction format naturally encourages the model to pay more attention to task instruction, while the pre-instruction format places more emphasis on modeling coverage.- transformers>=4.28.0.dev0+
- python>=3.8.0
- torch>=1.10
- deepspeed>=0.8.3+
- datasets>=2.9.0+
-
Organizing original data into Post-Ins format
We provide processed training data used in our experiments at here, and you can also process your sentence pairs into Post-Ins format by the following script:sh scripts/organize_data.sh # you can replace the file with yours in this script
-
Fine-tuning LLMs
sh train/train_wmt.sh # take the machine translation as an example
-
Testing
sh test/test_wmt.sh # take the machine translation as an example
We provide all the model outputs for both the machine translation and text summarization task for an easy comparison. Below are partial results of the experiment:
Results on WMT22 for machine translation.
Results on CNN/DailyMail for long text summarization.
Please cite this paper if you find this repo useful.
@article{liu2023instruction,
title={Instruction Position Matters in Sequence Generation with Large Language Models},
author={Liu, Yijin and Zeng, Xianfeng and Meng, Fandong and Zhou, Jie},
journal={arXiv preprint arXiv:2308.12097},
year={2023}
}
Please feel free to contact us (yijinliu@tencent.com) for any further questions.