Source code for EMNLP 2018 paper: RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information.
Overview of RESIDE (proposed method): RESIDE first encodes each sentence in the bag by concatenating embeddings (denoted by ⊕) from Bi-GRU and Syntactic GCN for each token, followed by word attention.
Then, sentence embedding is concatenated with relation alias information, which comes from the Side Information Acquisition Section, before computing attention over sentences. Finally, bag representation with entity type information is fed to a softmax classifier. Please refer to paper for more details.
- Compatible with TensorFlow 1.x and Python 3.x.
- Dependencies can be installed using
requirements.txt
.
-
We use Riedel NYT and Google IISc Distant Supervision (GIDS) dataset for evaluation.
-
The processed version of the datasets can be downloaded from here. The structure of the processed input data is as follows.
{ "voc2id": {"w1": 0, "w2": 1, ...}, "type2id": {"type1": 0, "type2": 1 ...}, "max_pos": 123, "train": [ { "X": [[s1_w1, s1_w2, ...], [s2_w1, s2_w2, ...], ...], "Y": [bag_label], "Pos1": [[s1_p1_1, sent1_p1_2, ...], [s2_p1_1, s2_p1_2, ...], ...], "Pos2": [[s1_p2_1, sent1_p2_2, ...], [s2_p2_1, s2_p2_2, ...], ...], "SubPos": [s1_sub, s2_sub, ...], "ObjPos": [s1_obj, s2_obj, ...], "SubType": [s1_subType, s2_subType, ...], "ObjType": [s1_objType, s2_objType, ...], "ProbY": [[s1_rel_alias1, s1_rel_alias2, ...], [s2_rel_alias1, ... ], ...] "DepEdges": [[s1_dep_edges], [s2_dep_edges] ...] }, {}, ... ], "test": { same as "train"}, "valid": { same as "train"}, }
voc2id
is the mapping of words to their unique identifiertype2id
is the maping of entity type to their unique identifier.max_pos
is the maximum position to consider for positional embeddings.- Each entry of
train
,test
andvalid
is a bag of sentences, whereX
denotes the sentences in bag as the list of list of word indices.Y
is the relation expressed by the sentences in the bag.Pos1
andPos2
are position of each word in sentences wrt to target entity 1 and entity 2.SubPos
andObjPos
contains the position of the target entity 1 and entity 2 in each sentence.SubType
andObjType
contains the target entity 1 and entity 2 type information obtained from KG.ProbY
is the relation alias side information (refer paper) for the bag.DepEdges
is the edgelist of dependency parse for each sentence (required for GCN).
reside.py
contains TensorFlow (1.x) based implementation of RESIDE (proposed method).- Download the pretrained model's parameters from here.
- Execute
evaluate.sh
for comparing pretrained RESIDE model against baselines (plots Precision-Recall curve).
- Entity Type information provided in
side_info/type_info.zip
.- Entity type information can be used directly in the model.
- Relation Alias Information is provided in
side_info/relation_alias.zip
.
- Execute
setup.sh
for downloading GloVe embeddings. - For training RESIDE run:
python reside.py -data data/riedel_processed.pkl -name new_run
@inproceedings{Vashishth2018reside,
author = {Vashishth, Shikhar and Joshi, Rishabh and Prayaga, Sai Suman and Bhattacharyya, Chiranjib and Talukdar, Partha},
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
title = {RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information},
url = {https://shikhar-vashishth.github.io/assets/pdf/reside_emnlp18.pdf},
year = {2018}
}
For any clarification, comments, or suggestions please create an issue or contact shikhar@iisc.ac.in.