SemEval 2022 Task 4 SubTask 1: Patronizing and Condescending Language Detection

This is the natural language processing coursework repository for our team..

Emotional Damage Team

  • Boyu Han
  • Xinyu Bai
  • Yuze An

Project Dependency

  • imblearn
  • pytorch
  • simpletransformers
  • transformers
  • tensorboard
  • numpy
  • scipy
  • scikit-learn

Project Directory

├── data `Preprocessed csv data for training and evaluation`
├── loader `Data loader`
├── `Main python script to invoke functions`
├── model `All of our implemented models`
├── resource `Figures and data that is used to generate report`
├── runtime `Runtime cache and model checkpoints`
├── script `Scripts used to train on Slurm`
├── spec `Specifications of the task`
├── test `Python unittest directory`
└── util `Data analysis and performance optimization`



python -u [--train int] [--model_name str] [--data_type type]
  • [--train ] the default value is 1
    • 1: run training then testing
    • 0: return cached testing results of our final model: DeBERTaV2XLarge
  • [--model_name ] determines which model to use and the default value is DeBERTaV2XLarge
    • The value can be [DeBERTaV3Large, DeBERTaV2XLarge, DeBERTaBase, DeBERTaLarge, XLNet, Longformer]
  • [--data_type ] determines which type of data to use and the default value is clean_upsample
    • clean_upsample: Upsampled data without extra quotation marks
    • synonym_clean_upsample: Upsampled data without extra quotation marks uses synonym data augmentation technique
    • plain_upsample: Upsampled data

For example, you can train a DeBERTaV2XLarge model using clean_upsample data using:

python -u --train 1 --model_name DeBERTaV2XLarge --data_type clean_upsample


  • If you use DeBERTaV2XLarge which is our final model, an extra Bayesian Optimization step will be executed to maximize the model performance.
  • Due to the randomness in the initialization of the model and the randomized batch sampler, the f1-score may be slightly lower than what we stated on the paper. Also, we performed early-stopping per iteration which is not used in consideration of time in this training process. If you run the command directly, the model will be trained on 1 epoch and the results are collected afterwards.
  • To reproduce our result, use early-stopping at about 4900 training iterations for batch size 3 (slightly less than 1 epoch) and train the model on full labelled dataset before invoking Bayesian Optimization methods.

Run unittest

For example, we can run dataloader unittest test with:

python -m unittest test.DataLoaderTest.LoaderTestCase.test_loader

Run LongformerLarge model training with:

python -m unittest test.LongformerLargeTest.LongformerLargeTestCase.test_train

More tests are located within test folder.

Our result

All the results are stored in resource folder, including all figures, prediction files and labels.

Our final submission with final dataset on CodaLab

Precision Recall F1-Score
0.6154 0.6309 0.6231

Our final ranking on CodaLab Post-Evaluation Section

Our final ranking

Statistics for training dataset

Sequence length distribution

Reason to perform early-stopping

Model performance w.r.t. checkpoints

Bayesian Optimization steps

Bayesian Optimization steps