deep-distant-supervision

Distant supervision algorithms with neural network.

Model

Training

To train the deep-distant supervision model, try

python main.py --is_train=True --num_hidden=256 --data_set=nyt --num_poch=3 --save_gap=1000 --print_gap=100

Examples of some configurable model parameters are:

  • word_attn: boolean, whether to use word-level attention layer
  • sent_attn: boolean, whether to use sentence-level attention layer
  • bidirectional: boolean, whether to use bidirectional rnn
  • data_set: path to dataset folder. the folder should contain train_x.npy and train_y.npy for training
  • save_gap: the number of batch steps used to save partially trained models.
  • print_gap: print status of training for every print_gap

To see the all possible configurations, try

python main.py --help

Data format

The model requires input file, predefined set of relations, and precomputed word-embedding vectors. Take a look at the following files for the required input format:

  • data/nyt/train.txt: each line consists of (entity1_id, entity2_id, entity1_surface_form, entity2_surface_form, relation, sentence)
  • data/nyt/relation2id: list of relations with their ids
  • data/word2vec.txt: each line consists of (word_token embedding_vector)

Testing

To test the trained model, try

python main.py --is_train=False --data_set=nyt

Requirements

  • Tensorflow 1.2
  • numpy, tqdm, scikit-learn

References