text_classification_tf: A Python repository from ljw23

Support tasks

Binary-class text classifcation
Multi-class text classification
Multi-label text classification
Hiearchical (multi-label) text classification (HMC)

Support text encoders

TextCNN (Kim, 2014)
RCNN (Lai et al., 2015)
TextRNN (Liu et al., 2016)
FastText (Joulin et al., 2016)
VDCNN (Conneau et al., 2016)
DPCNN (Johnson and Zhang, 2017)
AttentiveConvNet (Yin and Schutze, 2017)
DRNN (Wang, 2018)
Region embedding (Qiao et al., 2018)
Transformer encoder (Vaswani et al., 2017)
Star-Transformer encoder (Guo et al., 2019)

Requirement

Python 3
Tensorflow 2.0+
Numpy 1.14.3+

Usage

Training

python train.py conf/train.json

Detail configurations and explanations see Configuration.

The training info will be outputted in standard output and log.logger_file.

Evaluation

python eval.py conf/train.json

if eval.is_flat = false, hierarchical evaluation will be outputted.
eval.model_dir is the model to evaluate.
data.test_json_files is the input text file to evaluate.

The evaluation info will be outputed in eval.dir.

Input Data Format

JSON example:

{
    "doc_label": ["Computer--MachineLearning--DeepLearning", "Neuro--ComputationalNeuro"],
    "doc_token": ["I", "love", "deep", "learning"],
    "doc_keyword": ["deep learning"],
    "doc_topic": ["AI", "Machine learning"]
}

"doc_keyword" and "doc_topic" are optional.

Performance

0. Dataset

Dataset	Taxonomy	#Label	#Training	#Test
RCV1	Tree	103	23,149	781,265
Yelp	DAG	539	87,375	37,265

RCV1: Lewis et al., 2004
Yelp: Yelp

1. Compare with state-of-the-art

Text Encoders	Micro-F1 on RCV1	Micro-F1 on Yelp
HR-DGCNN (Peng et al., 2018)	0.7610	-
HMCN (Wehrmann et al., 2018)	0.8080	0.6640
Ours	0.8313	0.6704

HR-DGCNN: Peng et al., 2018
HMCN: Wehrmann et al., 2018

Acknowledgement

Some public codes are referenced by our toolkit:

Update

2019-04-29, init version

ljw23/text_classification_tf