The repository provides a general framework for building multitask learning systems based on a published paper at TMLR'23. Our work proposes a scalable approach to identifying negative transfers that occur in multitask learning through surrogate modeling. Our approach involves selecting subsets of auxiliary source tasks to optimize the performance of a primary target task. To achieve this, our approach consists of two main components:
- First, we sample random subsets of source tasks and precompute their multitask learning performance. We fit the performances with linear regression and predict the performances on unseen task subsets. The fitted model provides a relevance score between each task and the primary target task.
- Second, we perform subset selection on source tasks by choosing the tasks with a relevance score below a certain threshold, which is obtained by optimizing the surrogate model in terms of task subsets.
We conduct our experiments in the following Python environment and with the following packages:
- Python >= 3.6
- PyTorch >= 1.10.0 with CUDA 11.3
- transformers >= 4.12.5. We include a full copy of Hugging Face Transformers in our code for convenience.
- datasets >= 1.17.0. We use datasets to manage and download the NLP datasets.
- wrench >= 1.1.2. We refer readers to this page for instructions on installing the Wrench benchmark.
We provide the full list of packages used in our Python environment. To install the requirements:
pip install -r requirements.txt
The code in this repo provides an example to conduct multitask learning using BERT models. The folder structure is as follows:
Task-Modeling/
├── trainer/ - Containing the main logic for training and evaluation
│ ├── base_trainer.py - The basic logic of conduct multitask learning
│ └── ... - Implementing your own trainer inheriting BaseTrainer
├── models/
│ ├── modeling_multi_bert.py - Bert model with multi prediction heads on top
│ └── prediction_heads.py - Defining prediction heads for various tasks
│ └── ... - Add your own model architecture in transfromers
├── data_loaders/ - The logics for processing different datasets
│ └── multitask_dataset.py - The basic logic to combine multiple datasets
│ └── collators.py - Batch collators for various datasets
│ └── load_data_fns.py - Functions to load raw data into transformer datasets
│ └── ...
├── metrics/ - Evaluation metrics for various tasks
│ ├── ... - Write your own evaluation metric if it is not defined in datasets
├── utils/ - small utility functions
│ ├── ...
├── results/ - Files to store the multitask learning performances
├── affinity/ - Files to store the task affinity scores
├── train_multitask.py
├── ...
Customization:
- Add new dataset:
- Specify the
task_to_benchmark
,task_to_collator
, andtask_to_load_fns
indata_loaders/__init__.py
. Write specific collator and load functions incollators.py
andload_data_fns.py
if the new dataset does not fit into the default collator and loading logic. - Specify the
task_to_metric_name
inmetrics/__init__.py
. If the new metric is not defined in thedatasets
package, write a new metric file under the./metrics
folder following thehuggingface-datasets
instructions. - Specify the
task_to_prediction_heads
inmodels/__init__.py
.
- Specify the
- Add new architecture: Write a new file for the new architecture and rewrite the
forward()
function to enable taking multitask dataset batches. See theMultitaskBertForClassification
inmodeling_multi_bert.py
for an example.
Experiments on Multitask Weak Supervision
Thanks to Zhang et al. for providing the weak supervision datasets online. Please visit their GitHub page to download the datasets. Download the dataset into the ./ws_datasets
folder. For example, unzip the cdr dataset as ./ws_datasets/cdr/
.
Use python train_multitask_ws.py
to run the code. Specify the following main arguments:
--ws_task_name
specifies the text dataset name. Choose fromyoutube
,trec
,cdr
,chemprot
,semeval
.--lf_idxes
specifies the weak supervision source indexes.0
represents the full dataset with true labels. Weak supervision sources range from1
to# of ws sources
. Please see the WRENCH benchmark for the number of sources in each dataset.--save_name
specifies the name of a.csv
file to save the results of training. This is crucial if one wants to save the results for further use.
Refer to the following scripts to conduct the experiments.
python train_multitask_ws.py --use_ws_dataset --use_one_predhead \\
--ws_task_name youtube --model_name_or_path bert-base-uncased --lf_idxes 6 2 5 7 4 3 \\
--monitor_mode max --monitor_metric val_youtube_0_f1\\
--lr 5e-5 --epochs 5 --runs 5 --device 0 --save_name ws_baseline
python train_multitask_ws.py --use_ws_dataset --use_one_predhead \\
--ws_task_name trec --model_name_or_path bert-base-uncased \\
--lf_idxes 65 18 45 40 52 68 27 67 66 63 48 28 2 14 31 11 7 26 24 20 64 54 8 9 53 3 23 22 16 50 57 19 43 46 35 62 10 15 41 49 37 1 47 33 6 34 56 30 60 21 4 17 44 55 13 59 58 39 32 25 29 51 61 38\\
--monitor_mode max --monitor_metric val_trec_0_f1\\
--lr 1e-4 --epochs 5 --runs 5 --device 0 --save_name ws_baseline
python train_multitask_ws.py --use_ws_dataset --use_one_predhead \\
--ws_task_name cdr --model_name_or_path bert-base-uncased --lf_idxes 9 15 13 4 25 20 19 10 2 7 23 33 1 30 31 3 6 8 16 28 17 21 24 27 29 22 12 26 14 32 18\\
--monitor_mode max --monitor_metric val_cdr_0_f1\\
--lr 2e-5 --epochs 5 --runs 5 --device 0 --save_name ws_baseline
python train_multitask_ws.py --use_ws_dataset --use_one_predhead \\
--ws_task_name chemprot --model_name_or_path bert-base-uncased --lf_idxes 14 17 16 4 6 13 23 12 8 20 1 5 3 19 21 25 7 10 15 2 \\
--monitor_mode max --monitor_metric val_chemprot_0_accuracy\\
--lr 2e-5 --gradient_accumulation_steps 2 --epochs 5 --runs 5 --device 0 --save_name ws_baseline
python train_multitask_ws.py --use_ws_dataset --use_one_predhead \\
--ws_task_name semeval --model_name_or_path bert-base-uncased \\
--lf_idxes 133 135 41 134 130 129 44 42 136 132 131 43 51 138 56 53 54 48 52 45 46 57 49 37 64 12 13 118 103 17 80 50 55 47 60 25 107 35 27 38 102 71 157 6 65 32 86 92 99 72 29 16 31 122 84 73 30 140 74 76 78 105 91 23 18 20 15 81 95 93 152 79 156 88 4 63 59 145 21 11 154 94 3 96 153 9 75 66 114 110 106 7 89 148 97 98 77 128 14 85 116 82 10 19 62 123 127 34 108 112 68 69 67 26 40 1 87 61 28 164 2 58 124 139 5 104 149 155 146 121 125 109 120 150 8 70 100 90 119 158 22 83 101 141 142 143 160 33 163 39 151 113 162 24 159 111 36 126 117\\
--monitor_mode max --monitor_metric val_semeval_0_f1\\
--lr 1e-4 --epochs 5 --runs 5 --device 0 --save_name ws_baseline
Experiments with NLP tasks
Use python train_multitask.py
to run the code. Specify --task_names
for the tasks combined in multitask learning.
Refer to the following scripts to conduct the experiments.
python train_multitask.py --task_names cola cb anli_r1 stance_hillary \\
--model_name_or_path bert-base-cased --lr 2e-5 --batch_size 8 --gradient_accumulation_steps 4 \\
--device 1 --save_name selected_tasks
python train_multitask.py --task_names rte cb anli_r1 anli_r2 \\
--model_name_or_path bert-base-cased --lr 5e-5 --batch_size 8 --gradient_accumulation_steps 4\\
--device 1 --save_name selected_tasks
python train_multitask.py --task_names cb stance_abortion boolq stance_atheism \\
--model_name_or_path bert-base-cased --lr 5e-5 --batch_size 8 --gradient_accumulation_steps 4 \\
--device 1 --save_name selected_tasks
python train_multitask.py --task_names copa rte boolq emotion cb hate stance_abortion stsb \\
--model_name_or_path bert-base-cased --lr 5e-5 --batch_size 8 --gradient_accumulation_steps 4 \\
--device 1 --save_name selected_tasks
python train_multitask.py --task_names wsc stance_climate boolq stsb cola stance_hillary stance_feminist irony\\
--model_name_or_path bert-base-cased --lr 4e-5 --batch_size 8 --gradient_accumulation_steps 4 \\
--device 1 --save_name selected_tasks
Sampling Subsets for Task Modeling
Use train_sample_ws_tasks.py
and train_sample_tasks.py
to sample subsets and collect results for task modeling. For example, a script to train sampled sets on the YouTube dataset is shown below.
python train_sample_ws_tasks.py \\
--ws_task_name youtube --num_samples 200\\
--min_task_num 3 --max_task_num 3\\
--device 0 --lr 2e-5 --epochs 5 --runs 5\\
--task_set_name ws_sampled_tasks_youtube\\
--save_name ws_sampled_tasks_youtube
We describe the important arguments in the following. For the rest of arguments, please refer to the script train_sample_ws_tasks.py
.
--min_task_num
and--max_task_num
specify the number of source tasks that are sampled into the training. The script will uniformly sample from subsets of source tasks whose size is betweenmin_task_num
andmax_task_num
.--task_set_name
specifies a file to record sampled tasks to avoid repeated results.--save_name
specifies the name for a.csv
file to store the training results. This is crucial since we need the results for learning task models.
For train_sample_tasks.py
, use —-target_task
to specify the main target task of modeling.
If you find this repository useful or happen to use it in a research paper, please cite our work with the following bib information.
@article{li2023identification,
title={Identification of Negative Transfers in Multitask Learning Using Surrogate Models},
author={Li, Dongyue and Nguyen, Huy and Zhang, Hongyang R},
journal={Transactions on Machine Learning Research},
year={2023}
}
Thanks to the authors of the following repositories for providing their implementation publicly available.