thisisiron/char-cnn-text-classification-keras

🎵 Character Level CNN For Text Classification implemented as Keras

Jupyter NotebookMIT

char-cnn-text-classification-keras

Summarization: https://thisisiron.github.io/nlp/Char-level-CNN/
PPT material: https://docs.google.com/presentation/d/1OpFnpL0BZkKadWoRvxkUbVPeHXSj8tpnpOchfNCv8hQ/edit?usp=sharing

Requirements

Python 3
Tensorflow 1.12

Training

python train.py

Evaluating

python eval.py  --weights_path $WEIGHTS_DIR

Data

Data Details

~~kaggle data: https://www.kaggle.com/c/word2vec-nlp-tutorial~~
Sentiment140 - A Twitter Sentiment Analysis Tool: http://help.sentiment140.com/for-students/

Data Download

Experiment

	Train Set ACC	Validation Set ACC	Test Set ACC
CharCNN	87.07%	82.21%	--%

CharCNN

optimizer: SGD (Adam)
alphabet: abcdefghijklmnopqrstuvwxyz0123456789,;.!?:'"/\|_@#$%^&*~`+-=<>()[]{}
embedding size: 69 (the number of alphabet)

TODO

Character Level CNN은 Dataset이 클 경우 좋은 결과를 보여주므로 Big Dataset으로 교체하여 실험
Jupyter code를 py code 형식으로 변환
Text Cleaning 수행 (https:// 주소 형식, @ID와 몇몇 특수 기호 삭제)
Save and Load 구현

Reference

Character-level Convolutional Networks for Text Classification