Class Dependent Effects of Regularization in Sequential Models and Language Data

Various forms of regularization are used to prevent over-fitting in nearly every widely used neural network model nowadays. Recent studies, specifically the paper The Effects of Regularization and Data Augmentation are Class Dependent by Yann LeCun have shown that while regularization can improve accuracy as a whole in image classification problems, the accuracy of certain classes is drastically lowered, even with uniformed regularizers like weight decay. This study explores whether these class-specific biases caused by regularization are present in Natural Language Processing (NLP) classification tasks. This study tests various types of sequential models of different complexities, including RNN, LSTM, and Pretrained BERT on different datasets with different numbers of classes. These models are then trained with different types of uninformed regularization. Ultimately, through performing these experiments it is empirically shown that more complex models, such as LSTMs and BERT trained and finetuned on high-class datasets are more prone to show class biases.

Code

The code for the study is organized within two directories specified as BERT and RNN-LSTM. The BERT directory contains two files bert_train.ipynb and bert_evaluate.ipynb. The bert_train.ipynb notebook contains code to load in a pretrained BERT model and the corresponding datasets and finetune the model for the task of Masked Language Modeling on different levels of L2 and Dropout Regularization methods. The bert_evaluate.ipynb notebook loads in the saved finetuned BERT models and evaluates the different class-specific test accuracies for the different models and generates a plot of different class accuracies for different levels of regularization on the model. The RNN-LSTM directory contains three files rnn_lstm_train.ipynb, rnn_lstm_evaluate.ipynb, and rnn_lstm_plots.ipynb. The rnn_lstm_train.ipynb notebook contains code to load and preprocess the different datasets, initialize the RNN and LSTM models, and train the models for multiclass classification on different levels of L1, L2, Dropout, and DropConnect Regularization methods. The rnn_lstm_evaluate.ipynb notebook loads in the different saved RNN and LSTM models and evaluates the different class-specific test accuracies for the different models on different levels of regularization. The rnn_lstm_plots.ipynb notebook generates plots of the class-specific test accuracies for the different models. The code for the RNN and LSTM models was written by myself and the code for the BERT model was written by my project collaborator, Noah McDermott.

apasunuri/Class-Based-Regularization-Effects-in-NLP

Class Dependent Effects of Regularization in Sequential Models and Language Data

Code