VDCNN

Tensorflow Implementation of Very Deep Convolutional Neural Network for Text Classification.

Note

This repository is a simple Keras implementation of VDCNN model proposed by Conneau et al. Paper for VDCNN.

Note: Temporal batch norm not implemented. "Temp batch norm applies same kind of regularization as batch norm, except that the activations in a mini-batch are jointly normalized over temporal instead of spatial locations." Right now this project is using regular Tensorflow batch normalization only.

See another VDCNN implementation in Pytorch if you feel more comfortable with Pytorch, in which the author is having detailed reproduced results as well. See the original Tensorflow implementation as well.

It should be noted that the VDCNN paper states that the implementation is done originally in Touch 7.

Prerequisites

Python3
Tensorflow 1.0 or higher
keras 2.1.5 or higher
Numpy

Datasets

The original paper tests several NLP datasets, including DBPedia, AG's News, Sogou News and etc. "data_helper.py" operates with CSV format train and test files.

Downloads of those NLP text classification datasets can be found here (Many thanks to ArdalanM):

Dataset	Classes	Train samples	Test samples	source
AG’s News	4	120 000	7 600	link
Sogou News	5	450 000	60 000	link
DBPedia	14	560 000	70 000	link
Yelp Review Polarity	2	560 000	38 000	link
Yelp Review Full	5	650 000	50 000	link
Yahoo! Answers	10	1 400 000	60 000	link
Amazon Review Full	5	3 000 000	650 000	link
Amazon Review Polarity	2	3 600 000	400 000	link

Parameters Setting

For all versions of VDCNN, training and testing is done on a Ubuntu 16.04 Server with Tesla K80, with Momentum Optimizer of decay 0.9, exponential learning rate decay, a evaluation interval of 25, a batch size of 128. Weights are initialized by He initialization proposed in He et al. Batch normalizations are using a decay of 0.999.

(There are tons of factors that can influence the testing accuracy of the model, but overall this project should be good to go. Training of a deep CNN model is not a easy task, patience is everything. -_-)

Experiments

TODO: Testing of more NLP benchmark datasets and presenting detailed results.

Results are reported as follows: (i) / (ii)

(i): Test set accuracy reported by the paper (acc = 100% - error_rate)
(ii): Test set accuracy reproduced by this Keras implementation

Results for Max Pooling:

Depth	ag_news	DBPedia	Sogou News
9 layers	90.83 / xx.xxxx	98.65 / xx.xxxx	96.30 / xx.xxxx
17 layers	91.12 / xx.xxxx	98.60 / xx.xxxx	96.46 / xx.xxxx
29 layers	91.27 / xx.xxxx	98.71 / xx.xxxx	96.64 / xx.xxxx

Results for K-max Pooling:

Depth	ag_news	DBPedia	Sogou News
9 layers	90.17 / xx.xxxx	98.44 / xx.xxxx	96.42 / xx.xxxx
17 layers	90.61 / xx.xxxx	98.39 / xx.xxxx	96.49 / xx.xxxx
29 layers	91.33 / xx.xxxx	98.59 / xx.xxxx	96.82 / xx.xxxx

Results for Conv downsampling:

Depth	ag_news	DBPedia	Sogou News
9 layers	90.17 / xx.xxxx	98.44 / xx.xxxx	96.42 / xx.xxxx
17 layers	90.61 / xx.xxxx	98.39 / xx.xxxx	96.49 / xx.xxxx
29 layers	91.33 / xx.xxxx	98.59 / xx.xxxx	96.82 / xx.xxxx

Results for Max Pooling with Shortcut:

Depth	ag_news	DBPedia	Sogou News
9 layers	90.83 / xx.xxxx	98.65 / xx.xxxx	96.30 / xx.xxxx
17 layers	91.12 / xx.xxxx	98.60 / xx.xxxx	96.46 / xx.xxxx
29 layers	91.27 / xx.xxxx	98.71 / xx.xxxx	96.64 / xx.xxxx

Results for K-max Pooling with Shortcut:

Depth	ag_news	DBPedia	Sogou News
9 layers	90.17 / xx.xxxx	98.44 / xx.xxxx	96.42 / xx.xxxx
17 layers	90.61 / xx.xxxx	98.39 / xx.xxxx	96.49 / xx.xxxx
29 layers	91.33 / xx.xxxx	98.59 / xx.xxxx	96.82 / xx.xxxx

Results for Conv downsampling with Shortcut:

Depth	ag_news	DBPedia	Sogou News
9 layers	90.17 / xx.xxxx	98.44 / xx.xxxx	96.42 / xx.xxxx
17 layers	90.61 / xx.xxxx	98.39 / xx.xxxx	96.49 / xx.xxxx
29 layers	91.33 / xx.xxxx	98.59 / xx.xxxx	96.82 / xx.xxxx

Reference

Original preprocessing codes and VDCNN Implementation By geduo15

Train Script and data iterator from Convolutional Neural Network for Text Classification

NLP Datasets Gathered by ArdalanM and Others