This directory contains the anonymous code for our paper "An Adaptive Layer to Leverage both Domain and Task Specific Information from Scarce Data" published at AAAI 2023. This code allows you to run our model (TAFT), its variants (TAFT noAdapt), and baselines (DAPT, TAPT) on GPUs. Please refer to the paper and Table 2 for more information.
Jump to
bash runs.sh
to run the experiments or open this file to see the commands for each experiment
If you find this work useful, please cite the following paper:
@inproceedings{guibon2023adapt,
title={An Adaptive Layer to Leverage both Domain and Task Specific Information from Scarce Data},
author={Guibon, Ga{\"e}l and Labeau, Matthieu and Lefeuvre, Luce and Clavel, Chlo{\'e}},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2023}
}
To run this code you need some python imports. Please refer to the requirements.txt
file to see the full list of our Python environnement used for the experiments, along with their version.
The main imports are of course PyTorch and affiliates, PyTorch lightning, Scikit-learn and Hugging Face's Tranformers.
Running the scripts and the training will require disk storage to save the models. It will automatically create some directories with new files (models, tensorboard logs, plots, and reports).
GPU environnement inforamtion are as follows (output from nvidia-smi
command):
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100S-PCI... On | 00000000:25:00.0 Off | 0 |
| N/A 27C P0 24W / 250W | 4MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Given we cannot legally share our confidential customer service dataset, we use here a dummy dataset that mimmics the data preparation process and the exact original data structure.
The code structure leverages PyTorch Lightning to obtain pseudo scripts dedicated to specific experiments. We will improve it to remove unecessary repetitions later on (a bit of factorization is needed).
Even though, the standard organization is a CustomDataModule
class to deal with dataset related info, and a PLBert
class to deal with model building.
The folder is organized as follows:
.
├── dapt_mlm.py ==> DAPT STEP 1
├── dapt_satisfaction_ft.py ==> DAPT STEP 2 ON SATISFACTION
├── dapt_status_ft.py == DAPT STEP 2 ON EITHER PC OR STATUS TASKS
├── data ==> DIRECTORY FOR INPUT DATA
│ ├── dummy_dataset.json ==> DUMMY DATASET
│ └── generate_fake_dataset.py ==> DUMMY DATASET GENERATOR AND PREPROCESSING
├── dataset_utils ==> UTILITARIES FOR DATASET HANDLING
│ ├── categories_sampler.py ==> SAMPLER BY CATEGORIES (FOR BALANCED DISTRIBUTION)
│ ├── custom_dataset.py
│ ├── episodic_sampler.py ==> EPISODIC SAMPLER AS TACKLED IN THE PAPER
│ └── imbalanced_sampler.py ==> IMBALANCED SAMPLER USED TO REPORT THE RESULTS IN THE PAPER
├── README.md
├── requirements.txt ==> PYTHON ENVIRONNEMENT
├── runs.sh ==> MAIN START FILE TO RUN THE EXPERIMENTS
├── speaker_role_pretraining.py ==> TAFT OR TAPT STEP 1
├── taft_target_finetuning.py ==> TAFT STEP 2 WITH MULTITASK FINETUNING OF THE ADAPTIVE LAYER
└── utils.py
3 directories, 18 files
Commands lines dedicated to each experiments with sections are provided in the runs.sh
Bash script. You only need to comment the commands you do not want to try.
Each command is linked with a comment explaining the dedicated step from Figure 1 in the paper.
bash runs.sh