Relation Classification - SEMEVAL 2010 task 8 dataset
- Relation classification is a task of assigning predefined relation labels to the entity pairs that occur in texts.
- Example:
- Sentence: [People]_e1 have been moving back into [downtown]_e2
- Relation: Entity-Destination(e1,e2) where e1 = people, e2 = downtown
@MastersThesis{Sahitya:2018,
author = { {Sahitya Patel} and Harish Karnick},
title = {Multi-Way Classification of Relations Between Pairs of Entities},
school = {Indian Institute of Technology Kanpur (IITK)},
address = {India},
year = 2018,
month = 6
}
Relation-Classification-github.pdf
Paper: SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals
Zip: SemEval2010_task8_all_data.zip
- 01_create_train_test_attn
- 02_train_val_split
- 03_data_preprocess
- 04_CBGRU_MEA_Model
- Python 3.5.4 | Anaconda custom (64-bit)
- Keras 2.1.5
- Tensorflow 1.4.0
- CUDA compilation tools, release 8.0, V8.0.44 (nvcc --version)
- CuDNN 6.0.21
- Perl
- Get preprocessed data. Download "data_all.npy" from this-link (94.6 MB) and put it in the folder "./data/".
- Run "04_CBGRU_MEA_Model"
Description: Pre-processing of dataset files
Reads:
- "./corpus/SemEval2010_task8_training/TRAIN_FILE.TXT"
- "./corpus/SemEval2010_task8_testing_keys/TEST_FILE_FULL.TXT"
Creates:
- "./files/train_attn.txt"
- "./files/test_attn.txt"
To Do:
- Set the following path in "01_create_train_test_attn"
os.environ['CLASSPATH'] = "H:/Relation-Classification/stanford/stanford-postagger-2017-06-09"
- Run "01_create_train_test_attn"
Description: Spliting of the training data into training and validation data
Reads:
- "./files/train_attn.txt"
- "./files/test_attn.txt"
Creates:
- "./files/train_attn_sp.txt"
- "./files/val_attn_sp.txt"
- "./files/test_attn_sp.txt"
To Do:
- Run "02_train_val_split"
Description: Generating a single input file for the model
Creates:
- "./data/data_all.npy"
Steps:
- Place "GoogleNews-vectors-negative300.bin" in "./word_embeddings" folder.
- Run "./word_embeddings/GoogleNews-vectors-negative300_bin_to_txt.py" to create "./word_embeddings/GoogleNews-vectors-negative300.txt"
- Run "03_data_preprocess"
Description: Model training. Best model is saved in "./model" folder.
Steps:
- Run "04_CBGRU_MEA_Model"
Creates:
- "./model/model.keras" - Model