Distant supervision algorithms with neural network.
To train the deep-distant supervision model, try
python main.py --is_train=True --num_hidden=256 --data_set=nyt --num_poch=3 --save_gap=1000 --print_gap=100
Examples of some configurable model parameters are:
word_attn
: boolean, whether to use word-level attention layersent_attn
: boolean, whether to use sentence-level attention layerbidirectional
: boolean, whether to use bidirectional rnndata_set
: path to dataset folder. the folder should containtrain_x.npy
andtrain_y.npy
for trainingsave_gap
: the number of batch steps used to save partially trained models.print_gap
: print status of training for everyprint_gap
To see the all possible configurations, try
python main.py --help
The model requires input file, predefined set of relations, and precomputed word-embedding vectors. Take a look at the following files for the required input format:
data/nyt/train.txt
: each line consists of (entity1_id, entity2_id, entity1_surface_form, entity2_surface_form, relation, sentence)data/nyt/relation2id
: list of relations with their idsdata/word2vec.txt
: each line consists of (word_token embedding_vector)
To test the trained model, try
python main.py --is_train=False --data_set=nyt
- Tensorflow 1.2
- numpy, tqdm, scikit-learn