Keyphrase Extraction using BERT

Deep Keyphrase extraction using BERT.

Run in Colab

Link to the Notebook

Usage

Clone this repository and install pytorch-pretrained-BERT
Change the parameters accordingly in experiments/base_model/params.json. We recommend keeping batch size of 4 and sequence length of 512, with 6 epochs, if GPU's VRAM is around 11 GB.
For training, run the command python train.py
For eval, run the command, python evaluate.py
For running prediction on data/h1_7.txt file, run python keyphrase_task.py

Python version 3.7

Results

Subtask 1: Keyphrase Boundary Identification Using BERT

We used IO format here.

On test set, we got:

F1 score: 0.3799
Precision: 0.2992
Recall: 0.5201
Support: 921

Prediction on the given text file in `data/h1_7.txt`

['I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']

Changes after the test on Aug 7, 2021

Implemented a custom BERT Model, applying Bi-LSTM and CRF layers over PyTorch's bert-base-uncased pre-trained model
Added a weight matrix to the loss function to implement weighted CrossEntropyLoss

Credits

This is a modified version of the original repo pranav-ust/BERT-keyphrase-extraction

Sak-sh-i/BERT-keyphrase-extraction