JLSD implement

This is an attempt to reimplement JLSD technique, taken from this paper: A Joint Learning Approach based on Self-Distillation for Keyphrase Extraction from Scientific Documents (Tuan Manh Lai, Trung Bui, Doo Soon Kim, Quan Hung Tran)

Usage

python -m pip install -r requirements.txt
From scibert repo, untar the weights (rename their weight dump file to pytorch_model.bin) and vocab file into a new folder model.
Change the parameters accordingly in experiments/base_model/params.json. We recommend keeping batch size of 4 and sequence length of 512, with 6 epochs, if GPU's VRAM is around 11 GB.
Check script.sh for training and testing command

Django RestAPI

There is code inside folder kwsite to build a RestAPI using Django. To run: python manage.py runserver

Todo

We only considered a linear layer on top of BERT embeddings. We need to see whether SciBERT + BiLSTM + CRF makes a difference.

Credits

SciBERT: https://github.com/allenai/scibert
HuggingFace: https://github.com/huggingface/pytorch-pretrained-BERT
PyTorch NER: https://github.com/lemonhu/NER-BERT-pytorch
BERT: https://github.com/google-research/bert

Reference

https://github.com/pranav-ust/BERT-keyphrase-extraction

manhntm3/ke-jlsd

JLSD implement

Usage

Django RestAPI

Todo

Credits

Reference