- Binary-class text classifcation
- Multi-class text classification
- Multi-label text classification
- Hiearchical (multi-label) text classification (HMC)
- TextCNN (Kim, 2014)
- RCNN (Lai et al., 2015)
- TextRNN (Liu et al., 2016)
- FastText (Joulin et al., 2016)
- VDCNN (Conneau et al., 2016)
- DPCNN (Johnson and Zhang, 2017)
- AttentiveConvNet (Yin and Schutze, 2017)
- DRNN (Wang, 2018)
- Region embedding (Qiao et al., 2018)
- Transformer encoder (Vaswani et al., 2017)
- Star-Transformer encoder (Guo et al., 2019)
- Python 3
- Tensorflow 2.0+
- Numpy 1.14.3+
python train.py conf/train.json
Detail configurations and explanations see Configuration.
The training info will be outputted in standard output and log.logger_file.
python eval.py conf/train.json
- if eval.is_flat = false, hierarchical evaluation will be outputted.
- eval.model_dir is the model to evaluate.
- data.test_json_files is the input text file to evaluate.
The evaluation info will be outputed in eval.dir.
JSON example:
{
"doc_label": ["Computer--MachineLearning--DeepLearning", "Neuro--ComputationalNeuro"],
"doc_token": ["I", "love", "deep", "learning"],
"doc_keyword": ["deep learning"],
"doc_topic": ["AI", "Machine learning"]
}
"doc_keyword" and "doc_topic" are optional.
Dataset | Taxonomy | #Label | #Training | #Test |
---|---|---|---|---|
RCV1 | Tree | 103 | 23,149 | 781,265 |
Yelp | DAG | 539 | 87,375 | 37,265 |
- RCV1: Lewis et al., 2004
- Yelp: Yelp
Text Encoders | Micro-F1 on RCV1 | Micro-F1 on Yelp |
---|---|---|
HR-DGCNN (Peng et al., 2018) | 0.7610 | - |
HMCN (Wehrmann et al., 2018) | 0.8080 | 0.6640 |
Ours | 0.8313 | 0.6704 |
- HR-DGCNN: Peng et al., 2018
- HMCN: Wehrmann et al., 2018
Some public codes are referenced by our toolkit:
- https://github.com/ailias/Focal-Loss-implement-on-Tensorflow/
- https://github.com/brightmart/text_classification
- 2019-04-29, init version