BiLSTM-CRF and DistilRoBERTa Models for Legal Named Entity Recognition

gans blank ┐( ͡° ʖ̯ ͡°)┌ 🦢

Project Description

This repository contains the final project for the course 'Advanced Natural Language Processing' of the M.Sc. Cognitive Systems: Language, Learning and Reasoning at Universität Potsdam. This project deals with the SemEval-2023 task 6: LegalEval , subtask B: Legal Entity Recognition (L-NER). You can find the paper presenting this task here. This repository has been contributed by Guillem Gili i Bueno, Yi-Sheng Hsu and Delfina Jovanovich Trakál.

In this project, we propose two models for L-NER: a bidirectional long-short term memory neural network with a conditional random field layer (BiLSTM-CRF) and a pretrained DistilRoBERTa model.

Requirements

The packages required to run this project can be found in requirements.txt.

$ pip install -r requirements.txt

Make sure your Python version is compatible with PyTorch.

Dataset

The data has been collected by the SemEval-2023 tasks 6 creators. It is divided into two categories, judgement and preamble, which don't present the same entity type and frequency. The .json files can be found under src/data. More details on the data extraction and annotation processes can be found in the base paper linked above.

Setup

Training Prequisites

We must do the split between training,validation and testing:

$ python src/main.py --split_datasets

The new files will also be saved under src/data

BiLSTM-CRF

To use this method we will need some pretrained Word Embeddings. Download the pretrained Glove word embeddings:

$ python src/main.py --download_glove

distilRoBERTa

In this case we need to download the pretrained model for distilroberta-base since this model is the milestone we will be fine-tuning to our data. The code in src/roberta.py automatically downloads its pretrained model from huggingface, so there is no need to run any explicit commands. However note that the first time this code is run, it may take a while to download the model.

It is also worth nothing that for roberta the batch_size values are hardcoded, since we had to cater to our GPU limitations(NVIDIA GeForce GTX 1650). The current batch_sizes are: 4 for training, 48 for validation and are declared atop src/roberta.py as BATCH_SIZE_TRAIN_CONCURRENT and BATCH_SIZE_VALIDATE_CONCURRENT. Feel free to tinker with them if you are running out of GPU memory or you want to run the training faster.

Training

Models will be in the folder src/generated_models and plots in the folder src/plots. In the case of roberta, where to save the plots can be specified with the --round parameter(1, 2 or other that will leave the plots in the folder inside src/plots round1_roberta, round2_roberta or other respectively).

Initialize either a BiLSTM-CRF or a RoBERTa model by using either the --bilstm_crf or the --roberta arguments. For training, specify for either model the number of epochs, the batch size (only for BiLSTM-CRF), and the learning rate with the respective parameters --epochs,--batch_size, and --lr. Choose either the judgement or the preamble datasets with the argument --dataset. Here are the base examples:

$ python3 src/main.py --bilstm_crf --epochs 25 --batch_size 256 --lr 0.05 --dataset judgement
$ python3 src/main.py --roberta  --epochs 10 --lr 0.00005 --dataset preamble --round 1

Testing and Evaluation

Run $ python src/main.py --evaluate_model to test and evaluate either model on either judgement or preamble dev data. Specify which model to evaluate after the argument --model and on which dataset to test (judgement or preamble). We use F1 score. Here is an example:

$ python src/main.py --evaluate_model bilstm_crf.judgement.e25.bs256.lr0.05 --model judgement

Reproducing our results

Training

Models will be in the folder src/generated_models and plots in the folder src/plots

distilRoBERTa

You may need to give permissions to your filesystem to run the scripts:

$ chmod 755 generate_models_roberta.sh
$ chmod 755 generate_models_roberta_final.sh

To replicate the models and plots from the first round of experiments (where we test different learning rates for 10 epochs each, THIS WILL TAKE A WHILE!!):

$ ./generate_models_roberta.sh

This took 2 hours for the preamble models and 8 hours for the NVIDIA GeForce GTX 1650, so you may want to edit the generate_models_roberta.sh to make it take just 5 epochs.

To replicate the models and plots from the second round of experiments (where we only train the 2 best models, one for preamble and one for judgement):

$ ./generate_models_roberta_final.sh

This took an hour for the NVIDIA GeForce GTX 1650.

BiLSTM-CRF

You may need to give permissions to your filesystem to run the scripts:

$ chmod 755 generate_models_bilstm_crf.sh

Then simply run the following script:

$ ./generate_models_roberta.sh

The script contains data splitting, downloading GloVe embeddings, and training the models with the hyperparameters that we mainly referred to: epoch=25, batch_size=256, lr=0.01 (preamble) or 0.05 (judgement).

It takes around 7 hours to train the model with both datasets on a MacBook Pro (M1, 2021). It is also possible to edit generate_models_bilstm_crf.sh to make adjustments such as reducing epochs. Primanry results of this model can usually be observed with around 15 epochs.

Evaluation

This part will simply test a model then print the resulting csvs under src/evaluation_logs/. Once again, you may need to give permissions to your filesystem to run the scripts:

$ chmod 755 evaluate_models.sh

And simply run it, it will generate the csvs and show the results by terminal:

$ ./evaluate_models.sh

References

  1. Advanced: Making Dynamic Decisions and the Bi-LSTM CRF | PyTorch Tutorials
  2. F1-Score | Hugging Face evaluation Library
  3. Transformer Token Classification | Hugging Face Transformer Token Classification
  4. pytorch-RoBERTa-named-entity-recognition | Kaggle RoBERTa Model