/kim_emnlp_2014

Convolutional Neural Networks for Sentence Classification in DyNet

Primary LanguagePython

Convolutional Neural Networks for Sentence Classification

Unofficial DyNet implementation for the paper Convolutional Neural Networks for Sentence Classification (EMNLP 2014)[1].

1. Requirements

  • Python 3.6.0+
  • DyNet 2.0+
  • NumPy 1.12.1+
  • gensim 2.3.0+
  • scikit-learn 0.19.0+
  • tqdm 4.15.0+

2. Prepare dataset

To get movie review data[2] and pretrained word embeddings[3], run

sh data_download.sh

and

python preprocess.py

If you use your own dataset, please specify the paths of train and valid data files with command-line arguments described below.

3. Train

Arguments

  • --gpu: GPU ID to use. For cpu, set -1 [default: 0]
  • --train_x_path: File path of train x data [default: ./data/train_x.txt]
  • --train_y_path: File path of train y data [default: ./data/train_y.txt]
  • --valid_x_path: File path of valid x data [default: ./data/valid_x.txt]
  • --valid_y_path: File path of valid y data [default: ./data/valid_y.txt]
  • --n_epochs: Number of epochs [default: 10]
  • --batch_size: Mini batch size [default: 64]
  • --win_sizes: Window sizes of filters [default: [3, 4, 5]]
  • --num_fil: Number of filters in each window size [default: 100]
  • --s: L2 norm constraint on w [default: 3.0]
  • --dropout_prob: Dropout probability [default: 0.5]
  • --v_strategy: Embeding strategy. [default: non-static]
    • rand: Random initialization.
    • static: Load pretrained embeddings and do not update during the training.
    • non-static: Load pretrained embeddings and update during the training.
    • multichannel: Load pretrained embeddings as two channels and update one of them during the training and do not update the other one.
  • --alloc_mem: Amount of memory to allocate [mb] [default: 4096]

Command example

python train_manualbatch.py --num_epochs 20

4. Test

Arguments

  • --gpu: GPU ID to use. For cpu, set -1 [default: 0]
  • --model_file: Model to use for prediction [default: ./model]
  • --input_file: Input file path [default: ./data/valid_x.txt]
  • --output_file: Output file path [default: ./pred_y.txt]
  • --w2i_file: Word2Index file path [default: ./w2i.dump]
  • --i2w_file: Index2Word file path [default: ./i2w.dump]
  • --alloc_mem: Amount of memory to allocate [mb] [default: 1024]

Command example

python test.py

5. Results

All examples below are trained with the default arguments except v_strategy.

v_strategy: static
EPOCH: 1, Train Loss:: 0.636 (F1:: 0.640, Acc:: 0.642), Valid Loss:: 0.567 (F1:: 0.606, Acc:: 0.694), Time:: 1.614[s]
EPOCH: 2, Train Loss:: 0.474 (F1:: 0.770, Acc:: 0.774), Valid Loss:: 0.494 (F1:: 0.734, Acc:: 0.761), Time:: 4.307[s]
EPOCH: 3, Train Loss:: 0.393 (F1:: 0.829, Acc:: 0.830), Valid Loss:: 0.460 (F1:: 0.776, Acc:: 0.785), Time:: 6.987[s]
EPOCH: 4, Train Loss:: 0.329 (F1:: 0.866, Acc:: 0.867), Valid Loss:: 0.454 (F1:: 0.782, Acc:: 0.789), Time:: 9.686[s]
EPOCH: 5, Train Loss:: 0.272 (F1:: 0.897, Acc:: 0.898), Valid Loss:: 0.452 (F1:: 0.783, Acc:: 0.792), Time:: 12.384[s]
EPOCH: 6, Train Loss:: 0.217 (F1:: 0.929, Acc:: 0.929), Valid Loss:: 0.445 (F1:: 0.808, Acc:: 0.809), Time:: 15.088[s]
EPOCH: 7, Train Loss:: 0.167 (F1:: 0.956, Acc:: 0.956), Valid Loss:: 0.446 (F1:: 0.813, Acc:: 0.810), Time:: 17.798[s]
EPOCH: 8, Train Loss:: 0.129 (F1:: 0.971, Acc:: 0.972), Valid Loss:: 0.452 (F1:: 0.810, Acc:: 0.805), Time:: 20.509[s]
EPOCH: 9, Train Loss:: 0.102 (F1:: 0.981, Acc:: 0.981), Valid Loss:: 0.458 (F1:: 0.809, Acc:: 0.806), Time:: 23.202[s]
EPOCH: 10, Train Loss:: 0.086 (F1:: 0.988, Acc:: 0.988), Valid Loss:: 0.459 (F1:: 0.810, Acc:: 0.805), Time:: 25.899[s]
v_strategy: non-static
EPOCH: 1, Train Loss: 0.611 (F1: 0.654, Acc: 0.658), Valid Loss: 0.490 (F1: 0.783, Acc: 0.776), Time: 1763.849[s]
EPOCH: 2, Train Loss: 0.370 (F1: 0.835, Acc: 0.837), Valid Loss: 0.484 (F1: 0.798, Acc: 0.776), Time: 3542.999[s]
EPOCH: 3, Train Loss: 0.227 (F1: 0.919, Acc: 0.920), Valid Loss: 0.487 (F1: 0.796, Acc: 0.794), Time: 5319.272[s]
EPOCH: 4, Train Loss: 0.121 (F1: 0.969, Acc: 0.969), Valid Loss: 0.527 (F1: 0.799, Acc: 0.786), Time: 7095.262[s]
EPOCH: 5, Train Loss: 0.058 (F1: 0.990, Acc: 0.990), Valid Loss: 0.583 (F1: 0.803, Acc: 0.792), Time: 8871.713[s]
EPOCH: 6, Train Loss: 0.029 (F1: 0.997, Acc: 0.997), Valid Loss: 0.634 (F1: 0.798, Acc: 0.794), Time: 10650.794[s]
EPOCH: 7, Train Loss: 0.015 (F1: 0.999, Acc: 0.999), Valid Loss: 0.688 (F1: 0.797, Acc: 0.794), Time: 12426.908[s]
EPOCH: 8, Train Loss: 0.009 (F1: 0.999, Acc: 0.999), Valid Loss: 0.740 (F1: 0.786, Acc: 0.784), Time: 14205.622[s]
EPOCH: 9, Train Loss: 0.006 (F1: 1.000, Acc: 1.000), Valid Loss: 0.781 (F1: 0.802, Acc: 0.794), Time: 15983.344[s]
EPOCH: 10, Train Loss: 0.004 (F1: 1.000, Acc: 1.000), Valid Loss: 0.819 (F1: 0.785, Acc: 0.784), Time: 17760.783[s]
v_strategy: rand
EPOCH: 1, Train Loss: 0.682 (F1: 0.578, Acc: 0.570), Valid Loss: 0.604 (F1: 0.704, Acc: 0.689), Time: 1767.448[s]
EPOCH: 2, Train Loss: 0.486 (F1: 0.780, Acc: 0.781), Valid Loss: 0.522 (F1: 0.752, Acc: 0.737), Time: 3548.673[s]
EPOCH: 3, Train Loss: 0.300 (F1: 0.889, Acc: 0.890), Valid Loss: 0.530 (F1: 0.746, Acc: 0.750), Time: 5327.865[s]
EPOCH: 4, Train Loss: 0.168 (F1: 0.949, Acc: 0.949), Valid Loss: 0.549 (F1: 0.771, Acc: 0.758), Time: 7107.400[s]
EPOCH: 5, Train Loss: 0.081 (F1: 0.983, Acc: 0.983), Valid Loss: 0.631 (F1: 0.763, Acc: 0.765), Time: 8886.359[s]
EPOCH: 6, Train Loss: 0.036 (F1: 0.995, Acc: 0.995), Valid Loss: 0.723 (F1: 0.757, Acc: 0.759), Time: 10662.619[s]
EPOCH: 7, Train Loss: 0.019 (F1: 0.998, Acc: 0.998), Valid Loss: 0.769 (F1: 0.761, Acc: 0.757), Time: 12433.836[s]
EPOCH: 8, Train Loss: 0.011 (F1: 0.999, Acc: 0.999), Valid Loss: 0.835 (F1: 0.753, Acc: 0.757), Time: 14207.155[s]
EPOCH: 9, Train Loss: 0.007 (F1: 1.000, Acc: 1.000), Valid Loss: 0.870 (F1: 0.761, Acc: 0.756), Time: 15979.763[s]
EPOCH: 10, Train Loss: 0.005 (F1: 1.000, Acc: 1.000), Valid Loss: 0.911 (F1: 0.760, Acc: 0.753), Time: 17749.891[s]
v_strategy: multichannel
EPOCH: 1, Train Loss: 0.626 (F1: 0.659, Acc: 0.661), Valid Loss: 0.480 (F1: 0.776, Acc: 0.773), Time: 1198.847[s]
EPOCH: 2, Train Loss: 0.334 (F1: 0.855, Acc: 0.856), Valid Loss: 0.493 (F1: 0.800, Acc: 0.774), Time: 2410.564[s]
EPOCH: 3, Train Loss: 0.171 (F1: 0.946, Acc: 0.947), Valid Loss: 0.479 (F1: 0.811, Acc: 0.805), Time: 3622.606[s]
EPOCH: 4, Train Loss: 0.075 (F1: 0.986, Acc: 0.987), Valid Loss: 0.506 (F1: 0.815, Acc: 0.810), Time: 4834.480[s]
EPOCH: 5, Train Loss: 0.034 (F1: 0.996, Acc: 0.996), Valid Loss: 0.557 (F1: 0.810, Acc: 0.797), Time: 6047.958[s]
EPOCH: 6, Train Loss: 0.016 (F1: 0.999, Acc: 0.999), Valid Loss: 0.588 (F1: 0.814, Acc: 0.811), Time: 7261.833[s]
EPOCH: 7, Train Loss: 0.010 (F1: 0.999, Acc: 0.999), Valid Loss: 0.615 (F1: 0.808, Acc: 0.803), Time: 8475.354[s]
EPOCH: 8, Train Loss: 0.006 (F1: 1.000, Acc: 1.000), Valid Loss: 0.659 (F1: 0.805, Acc: 0.809), Time: 9687.347[s]
EPOCH: 9, Train Loss: 0.005 (F1: 1.000, Acc: 1.000), Valid Loss: 0.668 (F1: 0.808, Acc: 0.801), Time: 10897.239[s]
EPOCH: 10, Train Loss: 0.004 (F1: 1.000, Acc: 1.000), Valid Loss: 0.693 (F1: 0.802, Acc: 0.794), Time: 12104.786[s]

Notes

  • All experiments are done with GeForce GTX 1060 (6GB).
  • Adam optimizer is used in all experiments (Original paper used Adadelta).

Reference

  • [1] Y. Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of EMNLP 2014 [pdf]
  • [2] B. Peng and L. Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the ACL [pdf]
  • [3] Google News corpus word vector [link]