Parameter-Efficient-Tuning-Makes-a-Good-Classification-Head

Source code for EMNLP2022 long paper: Parameter-Efficient Tuning Makes a Good Classification Head

We found that

Finetune the pretrained LM with a parameter-efficient algorithm.

Finetune the pretrained LM with initializing the classification head as the weight from 1.

usually better than direct finetuning.

We implement our methods base on a open source libary SwissArmyTransformers.

Step 1.

Download checkpoint of RoBERTa-Large or BERT-Large (Provided by SwissArmyTransformer) and decompress.

Step 2.

Add checkpoint dir path to line 5 in EH-FT/roberta/scripts/finetune.sh

Step3.

cd EH-FT/roberta
python scripts/run_multiseed.py --number-gpu 1 --gpu-s 0 --seed-per-gpu 1 --dataset rte --finetune-type 2step+bitfit

Step4.

cd EH-FT/roberta
python scripts/run_multiseed.py --number-gpu 1 --gpu-s 0 --seed-per-gpu 1 --dataset rte --finetune-type 2step+bitfit

The script will launch [number-gpu] processes with gpu [gpu-s], gpu [gpu-s+1], ..., gpu [gpu-s + number-gpu - 1]. Each process has a different random seed.

You can change dataset and finetune-type.

Dataset: rte, mrpc, boolq, wic, cb, copa, wsc, qnli, stsb

Finetune-type	name in paper
all	traditional finetuning
2step+head	LP-FT
2step+bitfit	EH-FT(BitFit)
2step+lora	EH-FT(LoRA)
2step+pt	EH-FT(PT)
bitft/lora/pt	BitFit/LoRA/Prefix tuning
head	Linear Probing
child	child-tuning
mixout	Mixout

Step4.

See results in runs/ using tensorboard.

THUDM/Efficient-Head-Finetuning

Parameter-Efficient-Tuning-Makes-a-Good-Classification-Head