/emnlp2017-relation-extraction

Context-Aware Representations for Knowledge Base Relation Extraction

Primary LanguagePythonApache License 2.0Apache-2.0

Context-Aware Representations for Knowledge Base Relation Extraction

Relation extraction on an open-domain knowledge base

Accompanying repository for our EMNLP 2017 paper (full paper). It contains the code to replicate the experiments and the pre-trained models for sentence-level relation extraction. See below for links to other work on knowledge bases, question answering and graph neural networks.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Please use the following citation:

@inproceedings{TUD-CS-2017-0119,
	title = {{Context-Aware Representations for Knowledge Base Relation Extraction}},
	author = {Sorokin, Daniil and Gurevych, Iryna},
	booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
	pages = {1784-1789},
	year = {2017},
	location = {Copenhagen, Denmark},
	publisher = {Association for Computational Linguistics},
	doi = {10.18653/v1/D17-1188}
}

Paper abstract:

We demonstrate that for sentence-level relation extraction it is beneficial to consider other relations in the sentential context while predicting the target relation. Our architecture uses an LSTM-based encoder to jointly learn representations for all relations in a single sentence. We combine the context representations with an attention mechanism to make the final prediction. We use the Wikidata knowledge base to construct a dataset of multiple relations per sentence and to evaluate our approach. Compared to a baseline system, our method results in an average error reduction of 24% on a held-out set of relations.

Please, refer to the paper for more details.

The dataset described in the paper can be found here:

Contacts:

If you have any questions regarding the code, please, don't hesitate to contact the authors or report an issue.

Demo:

You can try out the relation extraction model on single sentences in our demo:

http://semanticparsing.ukp.informatik.tu-darmstadt.de:5000/relation-extraction/

UKP Lab's work on knowledge bases:

If you came here looking for our other work on linking text to Wikidata you can also find useful the following links

Wikipedia-Wikidata sentence-level relation data set

  • Download the data set from the paper here. See the data set ReadMe for more information on the format and see the paper on data set construction.

Project structure:

relation_extraction/
├── eval.py
├── model-train-and-test.py
├── notebooks
├── optimization_space.py
├── core
│   ├── parser.py
│   ├── embeddings.py
│   ├── entity_extraction.py
│   └── keras_models.py
├── relextserver
│   └── server.py
├── graph
│   ├── graph_utils.py
│   ├── io.py
│   └── vis_utils.py
├── stanford_tag_dataset.py
└── evaluation
    └── metrics.py
resources/
├── properties-with-labels.txt
└── property_blacklist.txt
FileDescription
relation_extraction/Main Python module
relation_extraction/coreModels for joint relation extraction
relation_extraction/relextserverThe code for the web demo.
relation_extraction/graphIO and processing for relation graphs
relation_extraction/evaluationEvaluation metrics
resources/Necessary resources
data/curves/The precision-recall curves for each model on the held out data

Setup:

  1. We recommend that you setup a new pip environment first: http://docs.python-guide.org/en/latest/dev/virtualenvs/

  2. Check out the repository and run:

pip3 install -r requirements.txt
  1. Set the Keras (deep learning library) backend to TensorFlow with the following command:
export KERAS_BACKEND=tensorflow

You can also permanently change Keras backend (read more: https://keras.io/backend/). Note that in order to reproduce the experiments in the paper you have to use Theano as a backend instead.

  1. Download the data, if you want to replicate the experiments from the paper. Extract the archive inside emnlp2017-relation-extraction/data/wikipedia-wikidata/. The data was preprocessed using Stanford Core NLP 3.7.0 models. See stanford_tag_dataset.py for more information.

  2. Download the GloVe embeddings, glove.6B.zip and put them into the folder emnlp2017-relation-extraction/resources/glove/. You can change the path to word embeddings in the model_params.json file if needed.

Pre-trained models:

  • You can download the models that were used in the experiments here
  • See Using pre-trained models.ipynb for a detailed example on how to use the pre-trained models in your code

Reproducing the experiments from the paper

To reproduce the experiments please refer to the version of the code that was published with the paper: tag emnlp17

In any other case, we recommend using the most recent version.

  1. Complete the setup above

  2. Run python model_train.py in emnlp2017-relation-extraction/relation_extraction/ to see the list of parameters

  3. If you put the data into the default folders you can train the ContextWeighted model with the following command:

python model_train.py model_ContextWeighted train ../data/wikipedia-wikidata/enwiki-20160501/semantic-graphs-filtered-training.02_06.json ../data/wikipedia-wikidata/enwiki-20160501/semantic-graphs-filtered-validation.02_06.json
  1. Run the following command to compute the precision-recall curves:
python precision_recall_curves.py model_ContextWeighted ../data/wikipedia-wikidata/enwiki-20160501/semantic-graphs-filtered-held-out.02_06.json

Notes

  • The web demo code is provided for information only. It is not meant to be run elsewhere.

Requirements:

  • Python 3.6
  • Keras 2.1.5
  • TensorFlow 1.6.0
  • See requirements.txt for library requirements.

License:

  • Apache License Version 2.0