BIOBERT-Relation-Extraction

To know in more detail please go through this. Pytorch implementation

Relation Exctraction

BIOBERT vs BERT

  • Pretained Biobert model link

Requirements and Setup

  • Python version
  • Spacy version
  • PyTorch Version
$ git clone https://github.com/seemapatel151997/BIOBERT-Relation-Extraction.git

Install all required python packages

$ pip install -r requirements.txt

To use BioBERT(biobert_v1.1_pubmed), download & unzip the pretrained model to ./additional_models folder.

run bash script to convert from tensorflow into pytorch version of the model.

Fine-Tuning

Run main_task.py with arguments below. Requires SemEval2010 Task 8 dataset, available here. Download & unzip to ./data/ folder.

$ python main_task.py \ 
    --train_data ./data/SemEval2010_task8_all_data/SemEval2010_task8_training/TRAIN_FILE.TXT \
    --test_data ./data/SemEval2010_task8_all_data/SemEval2010_task8_testing_keys/TEST_FILE_FULL.TXT \
    --use_pretrained_blanks 0 \
    --num_classes 19 \ 
    --batch_size 128 \ 
    --gradient_acc_steps 2 \ 
    --max_norm 1.0 \ 
    --fp16 0 \ 
    --num_epochs 50 \ 
    --lr 0.00001 \ 
    --model_no 2 \ 
    --model_size 'bert-base-uncased' \ 
    --train 1 \ 
    --infer 0

Inference

The script can also automatically detect potential entities in an input sentence, in which case all possible relation combinations are inferred.

Set infer=1 to enable inference mode and train=0 to disable training mode. model_no=2 to use the fine-tuned BioBERT model.

$ python main_task.py \
    --train 0 \
    --infer 1  \
    --model_no 2 \
    --input_sent "The COVID-19 pandemic, also known as the coronavirus pandemic, is an ongoing global pandemic of coronavirus disease 2019 (COVID‑19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)."

References