kim_emnlp_2014: A Python repository from toru34

Convolutional Neural Networks for Sentence Classification

Unofficial DyNet implementation for the paper Convolutional Neural Networks for Sentence Classification (EMNLP 2014)[1].

1. Requirements

Python 3.6.0+
DyNet 2.0+
NumPy 1.12.1+
gensim 2.3.0+
scikit-learn 0.19.0+
tqdm 4.15.0+

2. Prepare dataset

To get movie review data[2] and pretrained word embeddings[3], run

sh data_download.sh

and

python preprocess.py

If you use your own dataset, please specify the paths of train and valid data files with command-line arguments described below.

3. Train

Arguments

--gpu: GPU ID to use. For cpu, set -1 [default: 0]
--train_x_path: File path of train x data [default: ./data/train_x.txt]
--train_y_path: File path of train y data [default: ./data/train_y.txt]
--valid_x_path: File path of valid x data [default: ./data/valid_x.txt]
--valid_y_path: File path of valid y data [default: ./data/valid_y.txt]
--n_epochs: Number of epochs [default: 10]
--batch_size: Mini batch size [default: 64]
--win_sizes: Window sizes of filters [default: [3, 4, 5]]
--num_fil: Number of filters in each window size [default: 100]
--s: L2 norm constraint on w [default: 3.0]
--dropout_prob: Dropout probability [default: 0.5]
--v_strategy: Embeding strategy. [default: non-static]
- rand: Random initialization.
- static: Load pretrained embeddings and do not update during the training.
- non-static: Load pretrained embeddings and update during the training.
- multichannel: Load pretrained embeddings as two channels and update one of them during the training and do not update the other one.
--alloc_mem: Amount of memory to allocate [mb] [default: 4096]

Command example

python train_manualbatch.py --num_epochs 20

4. Test

Arguments

--gpu: GPU ID to use. For cpu, set -1 [default: 0]
--model_file: Model to use for prediction [default: ./model]
--input_file: Input file path [default: ./data/valid_x.txt]
--output_file: Output file path [default: ./pred_y.txt]
--w2i_file: Word2Index file path [default: ./w2i.dump]
--i2w_file: Index2Word file path [default: ./i2w.dump]
--alloc_mem: Amount of memory to allocate [mb] [default: 1024]

Command example

python test.py

5. Results

All examples below are trained with the default arguments except v_strategy.

`v_strategy`: static

EPOCH: 1, Train Loss:: 0.636 (F1:: 0.640, Acc:: 0.642), Valid Loss:: 0.567 (F1:: 0.606, Acc:: 0.694), Time:: 1.614[s]
EPOCH: 2, Train Loss:: 0.474 (F1:: 0.770, Acc:: 0.774), Valid Loss:: 0.494 (F1:: 0.734, Acc:: 0.761), Time:: 4.307[s]
EPOCH: 3, Train Loss:: 0.393 (F1:: 0.829, Acc:: 0.830), Valid Loss:: 0.460 (F1:: 0.776, Acc:: 0.785), Time:: 6.987[s]
EPOCH: 4, Train Loss:: 0.329 (F1:: 0.866, Acc:: 0.867), Valid Loss:: 0.454 (F1:: 0.782, Acc:: 0.789), Time:: 9.686[s]
EPOCH: 5, Train Loss:: 0.272 (F1:: 0.897, Acc:: 0.898), Valid Loss:: 0.452 (F1:: 0.783, Acc:: 0.792), Time:: 12.384[s]
EPOCH: 6, Train Loss:: 0.217 (F1:: 0.929, Acc:: 0.929), Valid Loss:: 0.445 (F1:: 0.808, Acc:: 0.809), Time:: 15.088[s]
EPOCH: 7, Train Loss:: 0.167 (F1:: 0.956, Acc:: 0.956), Valid Loss:: 0.446 (F1:: 0.813, Acc:: 0.810), Time:: 17.798[s]
EPOCH: 8, Train Loss:: 0.129 (F1:: 0.971, Acc:: 0.972), Valid Loss:: 0.452 (F1:: 0.810, Acc:: 0.805), Time:: 20.509[s]
EPOCH: 9, Train Loss:: 0.102 (F1:: 0.981, Acc:: 0.981), Valid Loss:: 0.458 (F1:: 0.809, Acc:: 0.806), Time:: 23.202[s]
EPOCH: 10, Train Loss:: 0.086 (F1:: 0.988, Acc:: 0.988), Valid Loss:: 0.459 (F1:: 0.810, Acc:: 0.805), Time:: 25.899[s]

`v_strategy`: non-static

EPOCH: 1, Train Loss: 0.611 (F1: 0.654, Acc: 0.658), Valid Loss: 0.490 (F1: 0.783, Acc: 0.776), Time: 1763.849[s]
EPOCH: 2, Train Loss: 0.370 (F1: 0.835, Acc: 0.837), Valid Loss: 0.484 (F1: 0.798, Acc: 0.776), Time: 3542.999[s]
EPOCH: 3, Train Loss: 0.227 (F1: 0.919, Acc: 0.920), Valid Loss: 0.487 (F1: 0.796, Acc: 0.794), Time: 5319.272[s]
EPOCH: 4, Train Loss: 0.121 (F1: 0.969, Acc: 0.969), Valid Loss: 0.527 (F1: 0.799, Acc: 0.786), Time: 7095.262[s]
EPOCH: 5, Train Loss: 0.058 (F1: 0.990, Acc: 0.990), Valid Loss: 0.583 (F1: 0.803, Acc: 0.792), Time: 8871.713[s]
EPOCH: 6, Train Loss: 0.029 (F1: 0.997, Acc: 0.997), Valid Loss: 0.634 (F1: 0.798, Acc: 0.794), Time: 10650.794[s]
EPOCH: 7, Train Loss: 0.015 (F1: 0.999, Acc: 0.999), Valid Loss: 0.688 (F1: 0.797, Acc: 0.794), Time: 12426.908[s]
EPOCH: 8, Train Loss: 0.009 (F1: 0.999, Acc: 0.999), Valid Loss: 0.740 (F1: 0.786, Acc: 0.784), Time: 14205.622[s]
EPOCH: 9, Train Loss: 0.006 (F1: 1.000, Acc: 1.000), Valid Loss: 0.781 (F1: 0.802, Acc: 0.794), Time: 15983.344[s]
EPOCH: 10, Train Loss: 0.004 (F1: 1.000, Acc: 1.000), Valid Loss: 0.819 (F1: 0.785, Acc: 0.784), Time: 17760.783[s]

`v_strategy`: rand

EPOCH: 1, Train Loss: 0.682 (F1: 0.578, Acc: 0.570), Valid Loss: 0.604 (F1: 0.704, Acc: 0.689), Time: 1767.448[s]
EPOCH: 2, Train Loss: 0.486 (F1: 0.780, Acc: 0.781), Valid Loss: 0.522 (F1: 0.752, Acc: 0.737), Time: 3548.673[s]
EPOCH: 3, Train Loss: 0.300 (F1: 0.889, Acc: 0.890), Valid Loss: 0.530 (F1: 0.746, Acc: 0.750), Time: 5327.865[s]
EPOCH: 4, Train Loss: 0.168 (F1: 0.949, Acc: 0.949), Valid Loss: 0.549 (F1: 0.771, Acc: 0.758), Time: 7107.400[s]
EPOCH: 5, Train Loss: 0.081 (F1: 0.983, Acc: 0.983), Valid Loss: 0.631 (F1: 0.763, Acc: 0.765), Time: 8886.359[s]
EPOCH: 6, Train Loss: 0.036 (F1: 0.995, Acc: 0.995), Valid Loss: 0.723 (F1: 0.757, Acc: 0.759), Time: 10662.619[s]
EPOCH: 7, Train Loss: 0.019 (F1: 0.998, Acc: 0.998), Valid Loss: 0.769 (F1: 0.761, Acc: 0.757), Time: 12433.836[s]
EPOCH: 8, Train Loss: 0.011 (F1: 0.999, Acc: 0.999), Valid Loss: 0.835 (F1: 0.753, Acc: 0.757), Time: 14207.155[s]
EPOCH: 9, Train Loss: 0.007 (F1: 1.000, Acc: 1.000), Valid Loss: 0.870 (F1: 0.761, Acc: 0.756), Time: 15979.763[s]
EPOCH: 10, Train Loss: 0.005 (F1: 1.000, Acc: 1.000), Valid Loss: 0.911 (F1: 0.760, Acc: 0.753), Time: 17749.891[s]

`v_strategy`: multichannel

EPOCH: 1, Train Loss: 0.626 (F1: 0.659, Acc: 0.661), Valid Loss: 0.480 (F1: 0.776, Acc: 0.773), Time: 1198.847[s]
EPOCH: 2, Train Loss: 0.334 (F1: 0.855, Acc: 0.856), Valid Loss: 0.493 (F1: 0.800, Acc: 0.774), Time: 2410.564[s]
EPOCH: 3, Train Loss: 0.171 (F1: 0.946, Acc: 0.947), Valid Loss: 0.479 (F1: 0.811, Acc: 0.805), Time: 3622.606[s]
EPOCH: 4, Train Loss: 0.075 (F1: 0.986, Acc: 0.987), Valid Loss: 0.506 (F1: 0.815, Acc: 0.810), Time: 4834.480[s]
EPOCH: 5, Train Loss: 0.034 (F1: 0.996, Acc: 0.996), Valid Loss: 0.557 (F1: 0.810, Acc: 0.797), Time: 6047.958[s]
EPOCH: 6, Train Loss: 0.016 (F1: 0.999, Acc: 0.999), Valid Loss: 0.588 (F1: 0.814, Acc: 0.811), Time: 7261.833[s]
EPOCH: 7, Train Loss: 0.010 (F1: 0.999, Acc: 0.999), Valid Loss: 0.615 (F1: 0.808, Acc: 0.803), Time: 8475.354[s]
EPOCH: 8, Train Loss: 0.006 (F1: 1.000, Acc: 1.000), Valid Loss: 0.659 (F1: 0.805, Acc: 0.809), Time: 9687.347[s]
EPOCH: 9, Train Loss: 0.005 (F1: 1.000, Acc: 1.000), Valid Loss: 0.668 (F1: 0.808, Acc: 0.801), Time: 10897.239[s]
EPOCH: 10, Train Loss: 0.004 (F1: 1.000, Acc: 1.000), Valid Loss: 0.693 (F1: 0.802, Acc: 0.794), Time: 12104.786[s]

Notes

All experiments are done with GeForce GTX 1060 (6GB).
Adam optimizer is used in all experiments (Original paper used Adadelta).

Reference

[1] Y. Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of EMNLP 2014 [pdf]
[2] B. Peng and L. Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the ACL [pdf]
[3] Google News corpus word vector [link]

toru34/kim_emnlp_2014

Convolutional Neural Networks for Sentence Classification

1. Requirements

2. Prepare dataset

3. Train

Arguments

Command example

4. Test

Arguments

Command example

5. Results

v_strategy: static

v_strategy: non-static

v_strategy: rand

v_strategy: multichannel

Notes

Reference

`v_strategy`: static

`v_strategy`: non-static

`v_strategy`: rand

`v_strategy`: multichannel