The goal of this repository is to implement text classification in traditional machine learning methods and deep learning methods (in Pytorch).
Note: Since I have to spend more time in conducting research, I stop reproducing classic papers on the task of text classification. However, I will still maintain this project. Any contributions lke issues and pull requests are welcome.
- BOW (Bag of Words)
- TFIDF (Term Frequency-Inverse Document Frequency)
- N-gram
- KNN (K-Nearest Neighbor)
- Decision Tree
- Perceptron
- Bagging
- Random Forest
- AdaBoost
- Gradient Boosting
- Naive Bayes
- SVM (Support Vector Machine)
- Logistic Regression
- DAN: Deep Unordered Composition Rivals Syntactic Methods for Text Classification
- TextCNN: Convolutional Neural Networks for Sentence Classification
- TextRNN: Recurrent Neural Network for Text Classification with Multi-Task Learning
-
RCNN -
Capsule Network -
Transformer -
Elmo -
BERT
Methods | KNN | Decision Tree | Perceptron | Bagging | Random Forest | AdaBoost | Gradient Boosting | Naive Bayes | SVM | Logistic Regression |
---|---|---|---|---|---|---|---|---|---|---|
BoW (ngram-range=(1, 1)) | 0.665 | 0.707 | 0.742 | 0.749 | 0.757 | 0.724 | 0.713 | 0.800 | 0.823 | 0.826 |
BoW (ngram-range=(1, 2)) | 0.666 | 0.699 | 0.750 | 0.744 | 0.751 | 0.724 | 0.712 | 0.795 | 0.819 | 0.823 |
BoW (ngram-range=(1, 3)) | 0.667 | 0.700 | 0.712 | 0.748 | 0.757 | 0.724 | 0.712 | 0.795 | 0.818 | 0.824 |
BoW (ngram-range=(2, 2)) | 0.579 | 0.628 | 0.652 | 0.646 | 0.652 | 0.584 | 0.600 | 0.669 | 0.671 | 0.692 |
BoW (ngram-range=(2, 3)) | 0.578 | 0.625 | 0.625 | 0.636 | 0.648 | 0.584 | 0.600 | 0.662 | 0.667 | 0.684 |
BoW (ngram-range=(3, 3)) | 0.536 | 0.572 | 0.525 | 0.578 | 0.581 | 0.532 | 0.539 | 0.561 | 0.576 | 0.590 |
TFIDF | 0.714 | 0.705 | 0.594 | 0.759 | 0.760 | 0.723 | 0.714 | 0.807 | 0.804 | 0.824 |
Methods | DAN | TextCNN | TextRNN |
---|---|---|---|
Trained Using Corpus | 0.880 | 0.893 | 0.903 |
Glove | 0.886 | 0.900 | 0.890 |
- Word2Vec Methods (Pre-trained and Trained Using Corpus) Comparision
- Class
Vectorizer
lazy initialization - Cross Validation
- Grid Search
- Visualize text feature
- Shared Classifier Module for Deep Learning