We introduce a novel schema for mechanisms
that generalizes across many types of activities, functions and influences in scientific literature. This repository contains models, datasets and experiments described in our NAACL 2021 paper: Extracting a Knowledge Base of Mechanisms from COVID-19 Papers.
- Please cite our paper if you use our datasets or models in your project. See the BibTeX.
- Feel free to email us.
Check out the DYGIE++ repo for an example notebook: loading a model trained on MECHANIC and extracting relations using a Spacy interface.
We provide two annotated datasets:
- Coarse-grained mechanism relations (
Direct
andIndirect
) - Granular mechanism relations (
Subject-Predicate-Object
)
From project root, run scripts/data/get_mechanic.sh
to download both datasets to the data
directory.
- Coarse-grained relations will be downloaded to
data/mechanic/coarse/[train,dev,test].json
. Development and test sets for are also available in tabular format:data/mechanic/coarse-gold/[dev,test]-gold.tsv
- Granular relations will be downloaded to
data/mechanic/granular/[train,dev,test].json
. Tabular format:data/mechanic/granular-gold/[dev,test]-gold.tsv
We provide models pre-trained on both datasets.
From project root, run scripts/pretrained/get_mechanic_pretrained.sh
to download all the available pretrained models to the pretrained
directory. If you only want one model, here are the download links.
- Dependencies
- Making predictions on existing datasets
- Relation extraction evaluation metric
- Training with Allentune
This code repository is forked from DYGIE++, Wadden 2019.
This code was developed using Python 3.7. To create a new Conda environment using Python 3.7, do conda create --name mechanic python=3.7
.
This library relies on AllenNLP and uses AllenNLP shell commands to kick off training, evaluation, and testing.
We use the Allentune for hyperparameter search. For installing a compatible version of the Allentune library, please download the allentune git repo outside of dygiepp directory using:
git clone https://github.com/allenai/allentune.git
Then replace the files provided in this repository using command
cp -r allentune_files/[location of downloaded allentune]
The you can proceed with installing allentune by running
pip install --editable .
in allentune downloaded folder.
After installing allentune please proceed with installing required libraries for DyGIE++. The necessary dependencies can be installed with
pip install -r requirements.txt
To make a prediction, you can use allennlp predict
. For example, to make a prediction with a pretrained granular relation model:
allennlp predict pretrained/mechanic-granular.tar.gz \
data/mechanic/granular/test.json \
--predictor dygie \
--include-package dygie \
--use-dataset-reader \
--output-file predictions/granular-test.jsonl \
--cuda-device 0 \
--silent
For predicting coarse relations using a pretrained model:
allennlp predict pretrained/mechanic-coarse.tar.gz \
data/mechanic/coarse/test.json \
--predictor dygie \
--include-package dygie \
--use-dataset-reader \
--output-file predictions/coarse-test.jsonl \
--cuda-device 0 \
--silent
Running these commands will provide json-formatted predictions.
Alternatively you can use the predict scripts provided by this library to generate both .tsv and .json file. You can use :
python predict_coarse.py --data_dir data/mechanic/coarse --device 0 --serial_dir pretrained/mechanic-coarse.tar.gz --pred_dir predictions/coarse-test/
for coarse relation predictions and
python predict_granular.py --data_dir data/mechanic/granular --device 0 --serial_dir pretrained/mechanic-granular.tar.gz --pred_dir predictions/granular-test/
for granular relation predictions.
We report Precision/Recall/F1
measured by using exact and partial span-matching functions. Full details are described in our paper.
We use Allentune for hyperparameter tuning. To train a model for coarse relation extraction using Allentune, you can run the script below.
python scripts/train/train_coarse_allentune.py --data_dir data/mechanic/coarse/ --device 0,1,2,3 --serial_dir models/coarse/ --gpu_count 4 --cpu_count 12 --device 0,1,2,3
To train the model for granular relations:
python scripts/train/train_granular_allentune.py --data_dir data/mechanic/granular/ --serial_dir ./models/granular --gpu_count 4 --cpu_count 12 --device 0,1,2,3
The default number of training samples is set to 30. For more training options please use the --h
command.
To obtain predictions for the development set over all Allentune runs:
python predict_coarse_allentune.py --data_dir data/mechanic/coarse/ --device 0 --serial_dir models/coarse/ --pred_dir predictions/coarse
for the coarse relation model and
python predict_granular_allentune.py --serial_dir ./models/granular --data_dir ./data/mechanic/granular/ --pred_dir ./predictions/granular/
for the granular relation model.
You can get test set predcitions by indicating only the run index you want to use for inference:
python predict_coarse_allentune.py --data_dir data/mechanic/coarse/ --device 0,1,2,3 --serial_dir models/coarse/ --pred_dir predictions/coarse
for coarse relations and
python predict_granular_allentune.py --serial_dir ./models/granular --data_dir ./data/mechanic/granular/ --pred_dir ./predictions/granular/ --test_data --test_index 17
for granular relations.
If using our dataset and models, please cite:
@inproceedings{mechanisms21,
title={{Extracting a Knowledge Base of Mechanisms from COVID-19 Papers
}},
author={Tom Hope and Aida Amini and David Wadden and Madeleine van Zuylen and Sravanthi Parasa and Eric Horvitz and Daniel Weld and Roy Schwartz and Hannaneh Hajishirzi},
year={2021},
journal={NAACL}
}
Please don't hesitate to reach out.
Email: tomh@allenai.org
, amini91@cs.washington.edu