/mop

Codes for paper: Mixture-of-Partitions: Infusing Large Biomedical Knowledge Graphs into BERT

Primary LanguagePython

Mixture-of-Partitions: Infusing Large Biomedical Knowledge Graphs into BERT

Authors: Zaiqiao Meng, Fangyu Liu, Thomas Hikaru Clark, Ehsan Shareghi, Nigel Collier.

Code of our paper Mixture-of-Partitions: Infusing Large Biomedical Knowledge Graphs into BERT[EMNLP2021]

News:

[26 August 2022] - Our paper has been accepted to appear at the EMNLP 2021 as a short paper.


Introduction

Infusing factual knowledge into pre-trained models is fundamental for many knowledge-intensive tasks. In this paper, we proposed Mixture-of-Partitions (MoP), an infusion approach that can handle a very large knowledge graph (KG) by partitioning it into smaller sub-graphs and infusing their specific knowledge into various BERT models using lightweight adapters. To leverage the overall factual knowledge for a target task, these sub-graph adapters are further fine-tuned along with the underlying BERT through a mixture layer. We evaluate our MoP with three biomedical BERTs (SciBERT, BioBERT, PubmedBERT) on six downstream tasks (inc. NLI, QA, Classification), and the results show that our MoP consistently enhances the underlying BERTs in task performance, and achieves new SOTA performances on five evaluated datasets.

front-page-graph


File structure

  • data_dir: downstream task dataset used in the experiments.
  • kg_dir: folder to save the knowledge graphs as well as the partitioned files.
  • model_dir: folder to save pre-trained models.
  • src: source code.
    • adapter-transformers: adapter-transformers v1.1.1 forked from adapter-transformers, it has been modified for using different mixture approaches.
    • evaluate_tasks: codes for the downstream tasks.
    • knowledge_infusion: knowledge infusion main codes.

kg_dir and model_dir can be downloaded at this link.

Installation

The code is tested with python 3.8.5, torch 1.7.0 and huggingface transformers 3.5.0. Please view requirements.txt for more details. Our models use a modified adapter-transformers. To use this package, please go to the ./src/adapter-transformers folder of this project, and run pip install . to install the adapter-transformers package

Datasets

Train knowledge fusion and downstream tasks

Train Knowledge Infusion

To train knowledge infusion, you can run the following command in the src/knowledge_infusion/entity_prediction folder.

Click to expand!
MODEL="microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext"
TOKENIZER="microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext"
INPUT_DIR="kg_dir"
OUTPUT_DIR="model_dir"
DATASET_NAME="S20Rel"
ADAPTER_NAMES="entity_predict"
PARTITION=20

python run_pretrain.py \
--model $MODEL \
--tokenizer $TOKENIZER \
--input_dir $INPUT_DIR \
--data_name $DATASET_NAME \
--output_dir $OUTPUT_DIR \
--n_partition $PARTITION \
--use_adapter \
--non_sequential \
--adapter_names  $ADAPTER_NAMES\
--amp \
--cuda \
--num_workers 32 \
--max_seq_length 64 \
--batch_size 256 \
--lr 1e-04 \
--epochs 1 \
--save_step 2000

Train Downstream Tasks

To evaluate the model on a downstream task, you can go to the task folder and see the *.sh file for an example. For example, the following command is used to train a model on pubmedqa dataset over different shuffle_rates.

Click to expand!
MODEL="microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext"
TOKENIZER="microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext"
ADAPTER_NAMES="entity_predict"
PARTITION=20
python run_pretrain.py \
 --model $MODEL \
 --tokenizer $TOKENIZER \
 --input_dir $INPUT_DIR \
 --output_dir $OUTPUT_DIR \
 --n_partition $PARTITION \
 --use_adapter \
 --non_sequential \
 --adapter_names  $ADAPTER_NAMES\
 --amp \
 --cuda \
 --num_workers 32 \
 --max_seq_length 64 \
 --batch_size 256 \
 --bi_direction \
 --lr 1e-04 \
 --epochs 2 \
 --save_step 2000
done

Hyper-parameters

Pre-train

Click to expand!
Parameter Value
lr 1e-04
epoch 1-2
batch_size 256
max_seq_length 64

BioASQ7b,BioASQ8b,PubMedQA

Click to expand!
Parameter Value
lr 1e-05
epoch 25
patient 5
batch_size 8
max_seq_length 512
repeat_run 10

MedQA

Click to expand!
Parameter Value
lr 1e-05,2e-05
epoch 25
patient 5
batch_size 12
max_seq_length 512
repeat_run 3
temperature 1

MedNLI

Click to expand!
Parameter Value
lr 1e-05
epoch 25
patient 5
batch_size 16
max_seq_length 256
repeat_run 3
temperature -15,-10,1

HoC

Click to expand!
Parameter Value
lr 1e-05,3e-05
epoch 25
patient 5
batch_size 16,32
max_seq_length 256
repeat_run 5
temperature 1

If you find our paper and resources useful, please kindly cite our paper:

@inproceedings{meng2021mixture,
  title={Mixture-of-Partitions: Infusing Large Biomedical Knowledge Graphs into BERT},
  author={Meng, Zaiqiao and Liu, Fangyu and Clark, Thomas and Shareghi, Ehsan and Collier, Nigel},
  booktitle={Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},
  pages={4672--4681},
  year={2021}
}

Contact

If you have any questions, feel free to contact me via (zm324@cam.ac.uk).