/VDCNN

Tensorflow and Keras Implementation of Very Deep Convolutional Neural Network for Text Classification

Primary LanguagePython

VDCNN

Tensorflow Implementation of Very Deep Convolutional Neural Network for Text Classification.

Note

This repository is a simple Keras implementation of VDCNN model proposed by Conneau et al. Paper for VDCNN.

Note: Temporal batch norm not implemented. "Temp batch norm applies same kind of regularization as batch norm, except that the activations in a mini-batch are jointly normalized over temporal instead of spatial locations." Right now this project is using regular Tensorflow batch normalization only.

See another VDCNN implementation in Pytorch if you feel more comfortable with Pytorch, in which the author is having detailed reproduced results as well. See the original Tensorflow implementation as well.

It should be noted that the VDCNN paper states that the implementation is done originally in Touch 7.

Prerequisites

  • Python3
  • Tensorflow 1.0 or higher
  • keras 2.1.5 or higher
  • Numpy

Datasets

The original paper tests several NLP datasets, including DBPedia, AG's News, Sogou News and etc. "data_helper.py" operates with CSV format train and test files.

Downloads of those NLP text classification datasets can be found here (Many thanks to ArdalanM):

Dataset Classes Train samples Test samples source
AG’s News 4 120 000 7 600 link
Sogou News 5 450 000 60 000 link
DBPedia 14 560 000 70 000 link
Yelp Review Polarity 2 560 000 38 000 link
Yelp Review Full 5 650 000 50 000 link
Yahoo! Answers 10 1 400 000 60 000 link
Amazon Review Full 5 3 000 000 650 000 link
Amazon Review Polarity 2 3 600 000 400 000 link

Parameters Setting

For all versions of VDCNN, training and testing is done on a Ubuntu 16.04 Server with Tesla K80, with Momentum Optimizer of decay 0.9, exponential learning rate decay, a evaluation interval of 25, a batch size of 128. Weights are initialized by He initialization proposed in He et al. Batch normalizations are using a decay of 0.999.

(There are tons of factors that can influence the testing accuracy of the model, but overall this project should be good to go. Training of a deep CNN model is not a easy task, patience is everything. -_-)

Experiments

TODO: Testing of more NLP benchmark datasets and presenting detailed results.

Results are reported as follows: (i) / (ii)

  • (i): Test set accuracy reported by the paper (acc = 100% - error_rate)
  • (ii): Test set accuracy reproduced by this Keras implementation

Results for Max Pooling:

Depth ag_news DBPedia Sogou News
9 layers 90.83 / xx.xxxx 98.65 / xx.xxxx 96.30 / xx.xxxx
17 layers 91.12 / xx.xxxx 98.60 / xx.xxxx 96.46 / xx.xxxx
29 layers 91.27 / xx.xxxx 98.71 / xx.xxxx 96.64 / xx.xxxx

Results for K-max Pooling:

Depth ag_news DBPedia Sogou News
9 layers 90.17 / xx.xxxx 98.44 / xx.xxxx 96.42 / xx.xxxx
17 layers 90.61 / xx.xxxx 98.39 / xx.xxxx 96.49 / xx.xxxx
29 layers 91.33 / xx.xxxx 98.59 / xx.xxxx 96.82 / xx.xxxx

Results for Conv downsampling:

Depth ag_news DBPedia Sogou News
9 layers 90.17 / xx.xxxx 98.44 / xx.xxxx 96.42 / xx.xxxx
17 layers 90.61 / xx.xxxx 98.39 / xx.xxxx 96.49 / xx.xxxx
29 layers 91.33 / xx.xxxx 98.59 / xx.xxxx 96.82 / xx.xxxx

Results for Max Pooling with Shortcut:

Depth ag_news DBPedia Sogou News
9 layers 90.83 / xx.xxxx 98.65 / xx.xxxx 96.30 / xx.xxxx
17 layers 91.12 / xx.xxxx 98.60 / xx.xxxx 96.46 / xx.xxxx
29 layers 91.27 / xx.xxxx 98.71 / xx.xxxx 96.64 / xx.xxxx

Results for K-max Pooling with Shortcut:

Depth ag_news DBPedia Sogou News
9 layers 90.17 / xx.xxxx 98.44 / xx.xxxx 96.42 / xx.xxxx
17 layers 90.61 / xx.xxxx 98.39 / xx.xxxx 96.49 / xx.xxxx
29 layers 91.33 / xx.xxxx 98.59 / xx.xxxx 96.82 / xx.xxxx

Results for Conv downsampling with Shortcut:

Depth ag_news DBPedia Sogou News
9 layers 90.17 / xx.xxxx 98.44 / xx.xxxx 96.42 / xx.xxxx
17 layers 90.61 / xx.xxxx 98.39 / xx.xxxx 96.49 / xx.xxxx
29 layers 91.33 / xx.xxxx 98.59 / xx.xxxx 96.82 / xx.xxxx

Reference

Original preprocessing codes and VDCNN Implementation By geduo15

Train Script and data iterator from Convolutional Neural Network for Text Classification

NLP Datasets Gathered by ArdalanM and Others