Tensorflow Implementation of Very Deep Convolutional Neural Network for Text Classification.
This repository is a simple Keras implementation of VDCNN model proposed by Conneau et al. Paper for VDCNN.
Note: Temporal batch norm not implemented. "Temp batch norm applies same kind of regularization as batch norm, except that the activations in a mini-batch are jointly normalized over temporal instead of spatial locations." Right now this project is using regular Tensorflow batch normalization only.
See another VDCNN implementation in Pytorch if you feel more comfortable with Pytorch, in which the author is having detailed reproduced results as well. See the original Tensorflow implementation as well.
It should be noted that the VDCNN paper states that the implementation is done originally in Touch 7.
- Python3
- Tensorflow 1.0 or higher
- keras 2.1.5 or higher
- Numpy
The original paper tests several NLP datasets, including DBPedia, AG's News, Sogou News and etc. "data_helper.py" operates with CSV format train and test files.
Downloads of those NLP text classification datasets can be found here (Many thanks to ArdalanM):
Dataset | Classes | Train samples | Test samples | source |
---|---|---|---|---|
AG’s News | 4 | 120 000 | 7 600 | link |
Sogou News | 5 | 450 000 | 60 000 | link |
DBPedia | 14 | 560 000 | 70 000 | link |
Yelp Review Polarity | 2 | 560 000 | 38 000 | link |
Yelp Review Full | 5 | 650 000 | 50 000 | link |
Yahoo! Answers | 10 | 1 400 000 | 60 000 | link |
Amazon Review Full | 5 | 3 000 000 | 650 000 | link |
Amazon Review Polarity | 2 | 3 600 000 | 400 000 | link |
For all versions of VDCNN, training and testing is done on a Ubuntu 16.04 Server with Tesla K80, with Momentum Optimizer of decay 0.9, exponential learning rate decay, a evaluation interval of 25, a batch size of 128. Weights are initialized by He initialization proposed in He et al. Batch normalizations are using a decay of 0.999.
(There are tons of factors that can influence the testing accuracy of the model, but overall this project should be good to go. Training of a deep CNN model is not a easy task, patience is everything. -_-)
TODO: Testing of more NLP benchmark datasets and presenting detailed results.
Results are reported as follows: (i) / (ii)
- (i): Test set accuracy reported by the paper (acc = 100% - error_rate)
- (ii): Test set accuracy reproduced by this Keras implementation
Results for Max Pooling:
Depth | ag_news | DBPedia | Sogou News |
---|---|---|---|
9 layers | 90.83 / xx.xxxx | 98.65 / xx.xxxx | 96.30 / xx.xxxx |
17 layers | 91.12 / xx.xxxx | 98.60 / xx.xxxx | 96.46 / xx.xxxx |
29 layers | 91.27 / xx.xxxx | 98.71 / xx.xxxx | 96.64 / xx.xxxx |
Results for K-max Pooling:
Depth | ag_news | DBPedia | Sogou News |
---|---|---|---|
9 layers | 90.17 / xx.xxxx | 98.44 / xx.xxxx | 96.42 / xx.xxxx |
17 layers | 90.61 / xx.xxxx | 98.39 / xx.xxxx | 96.49 / xx.xxxx |
29 layers | 91.33 / xx.xxxx | 98.59 / xx.xxxx | 96.82 / xx.xxxx |
Results for Conv downsampling:
Depth | ag_news | DBPedia | Sogou News |
---|---|---|---|
9 layers | 90.17 / xx.xxxx | 98.44 / xx.xxxx | 96.42 / xx.xxxx |
17 layers | 90.61 / xx.xxxx | 98.39 / xx.xxxx | 96.49 / xx.xxxx |
29 layers | 91.33 / xx.xxxx | 98.59 / xx.xxxx | 96.82 / xx.xxxx |
Results for Max Pooling with Shortcut:
Depth | ag_news | DBPedia | Sogou News |
---|---|---|---|
9 layers | 90.83 / xx.xxxx | 98.65 / xx.xxxx | 96.30 / xx.xxxx |
17 layers | 91.12 / xx.xxxx | 98.60 / xx.xxxx | 96.46 / xx.xxxx |
29 layers | 91.27 / xx.xxxx | 98.71 / xx.xxxx | 96.64 / xx.xxxx |
Results for K-max Pooling with Shortcut:
Depth | ag_news | DBPedia | Sogou News |
---|---|---|---|
9 layers | 90.17 / xx.xxxx | 98.44 / xx.xxxx | 96.42 / xx.xxxx |
17 layers | 90.61 / xx.xxxx | 98.39 / xx.xxxx | 96.49 / xx.xxxx |
29 layers | 91.33 / xx.xxxx | 98.59 / xx.xxxx | 96.82 / xx.xxxx |
Results for Conv downsampling with Shortcut:
Depth | ag_news | DBPedia | Sogou News |
---|---|---|---|
9 layers | 90.17 / xx.xxxx | 98.44 / xx.xxxx | 96.42 / xx.xxxx |
17 layers | 90.61 / xx.xxxx | 98.39 / xx.xxxx | 96.49 / xx.xxxx |
29 layers | 91.33 / xx.xxxx | 98.59 / xx.xxxx | 96.82 / xx.xxxx |
Original preprocessing codes and VDCNN Implementation By geduo15
Train Script and data iterator from Convolutional Neural Network for Text Classification