This notebook demonstrates an example of prefix tuning, where learned embeddings allow for a comparatively low-memory method to fine-tune behavior in a large language model via encoding behavior/information in a small number of embedding tokens. This notebook demonstrates an example of fine-tuning GPT-2 XL to translate from French to English and vice versa. The way this works is essentially as follows: we learn two prefixes, [start_english]
and [start_french]
. When we want to translate a French sentence to English, we would input the following prompt to the language model:
[start_french]Bonjour! Je m'appelle Sully.[start_english]
.
We can also do the same task via few-shot prompting of the language model. At the end of this notebook, we compare (via BLEU-score) the performance of prefix-tuning with few-shot translation and show a marked improvement. Ideally, we would do a comparison of fine-tuning the whole model on the dataset vs. prefix-tuning, but I don't have enough GPU memory on my machine to do that test. Anyway, we see a ~2x improvement in BLEU score via prefix-tuning versus 3-shot prompting!
- Dataset (we won't use all of the dataset, only ~200k examples of the dataset for training and evaluation. If you don't have enough memory to load the whole dataset, just load a subset instead!)
- Pytorch (for machine learning)
- NTLK (for BLEU score)
- Numpy (for math)
- TQDM (for progress bars)
- HuggingFace
transformers
(for GPT-2 XL)
Please see the Jupyter notebook in the repository for a detailed explanation and write-up!
Original French:
Cela est fait, en partie, pour répondre aux besoins des sénateurs, qui sont souvent membres de plusieurs comités.
Ground Truth English:
This is done, in part, to better accommodate the needs of senators who are often members of several committees.
GPT-2 XL 3-shot:
The program is designed to help participants become more involved in their communities.
GPT-2 XL prefix-tuning:
This is done for the benefit of the participants, who are members of many different organizations.