- Summarization: https://thisisiron.github.io/nlp/Char-level-CNN/
- PPT material: https://docs.google.com/presentation/d/1OpFnpL0BZkKadWoRvxkUbVPeHXSj8tpnpOchfNCv8hQ/edit?usp=sharing
- Python 3
- Tensorflow 1.12
python train.py
python eval.py --weights_path $WEIGHTS_DIR
kaggle data: https://www.kaggle.com/c/word2vec-nlp-tutorial
Sentiment140 - A Twitter Sentiment Analysis Tool: http://help.sentiment140.com/for-students/
Train Set ACC | Validation Set ACC | Test Set ACC | |
---|---|---|---|
CharCNN | 87.07% | 82.21% | --% |
CharCNN
- optimizer: SGD (Adam)
- alphabet: abcdefghijklmnopqrstuvwxyz0123456789,;.!?:'"/\|_@#$%^&*~`+-=<>()[]{}
- embedding size: 69 (the number of alphabet)
- Character Level CNN은 Dataset이 클 경우 좋은 결과를 보여주므로 Big Dataset으로 교체하여 실험
- Jupyter code를 py code 형식으로 변환
- Text Cleaning 수행 (https:// 주소 형식, @ID와 몇몇 특수 기호 삭제)
- Save and Load 구현
Character-level Convolutional Networks for Text Classification