NLP paper implementation relevant to classification with PyTorch
The papers were implemented in using korean corpus
pyenv virualenv 3.7.7 nlp
pyenv activate nlp
pip install -r requirements.txt
python build_dataset.py
python build_vocab.py
python train.py # default training parameter
python evaluate.py # defatul evaluation parameter
Single sentence classification (sentiment classification task)
- Using the Naver sentiment movie corpus v1.0 (a.k.a.
nsmc
)
- Configuration
conf/model/{type}.json
(e.g. type = ["sencnn", "charcnn",...]
)
conf/dataset/nsmc.json
- Structure
# example: Convolutional_Neural_Networks_for_Sentence_Classification
├── build_dataset.py
├── build_vocab.py
├── conf
│ ├── dataset
│ │ └── nsmc.json
│ └── model
│ └── sencnn.json
├── evaluate.py
├── experiments
│ └── sencnn
│ └── epochs_5_batch_size_256_learning_rate_0.001
├── model
│ ├── data.py
│ ├── __init__.py
│ ├── metric.py
│ ├── net.py
│ ├── ops.py
│ ├── split.py
│ └── utils.py
├── nsmc
│ ├── ratings_test.txt
│ ├── ratings_train.txt
│ ├── test.txt
│ ├── train.txt
│ ├── validation.txt
│ └── vocab.pkl
├── train.py
└── utils.py
Model \ Accuracy |
Train (120,000) |
Validation (30,000) |
Test (50,000) |
Date |
SenCNN |
91.95% |
86.54% |
85.84% |
20/05/30 |
CharCNN |
86.29% |
81.69% |
81.38% |
20/05/30 |
ConvRec |
86.23% |
82.93% |
82.43% |
20/05/30 |
VDCNN |
86.59% |
84.29% |
84.10% |
20/05/30 |
SAN |
90.71% |
86.70% |
86.37% |
20/05/30 |
ETRIBERT |
91.12% |
89.24% |
88.98% |
20/05/30 |
SKTBERT |
92.20% |
89.08% |
88.96% |
20/05/30 |
Pairwise-text-classification (paraphrase detection task)
# example: Siamese_recurrent_architectures_for_learning_sentence_similarity
├── build_dataset.py
├── build_vocab.py
├── conf
│ ├── dataset
│ │ └── qpair.json
│ └── model
│ └── siam.json
├── evaluate.py
├── experiments
│ └── siam
│ └── epochs_5_batch_size_64_learning_rate_0.001
├── model
│ ├── data.py
│ ├── __init__.py
│ ├── metric.py
│ ├── net.py
│ ├── ops.py
│ ├── split.py
│ └── utils.py
├── qpair
│ ├── kor_pair_test.csv
│ ├── kor_pair_train.csv
│ ├── test.txt
│ ├── train.txt
│ ├── validation.txt
│ └── vocab.pkl
├── train.py
└── utils.py
Model \ Accuracy |
Train (6,136) |
Validation (682) |
Test (758) |
Date |
Siam |
93.00% |
83.13% |
83.64% |
20/05/30 |
SAN |
89.47% |
82.11% |
81.53% |
20/05/30 |
Stochastic |
89.26% |
82.69% |
80.07% |
20/05/30 |
ETRIBERT |
95.07% |
94.42% |
94.06% |
20/05/30 |
SKTBERT |
95.43% |
92.52% |
93.93% |
20/05/30 |