Text-classification-with-CNN-RNN

Text classification with CNN, RNN, RCNN model by character, word level

한글 문서에 대해 classifier를 만든 예
Tensorflow(tensorflow-gpu==1.8.0)를 이용했으며, Convolutional layer, Recurrent layer를 이용
실 데이터를 사용(전에 공부용으로 수집한 데이터 인데, 문제될 경우 내리겠습니다.)

Ch01_Data_load

data_preprocessing.py: 텍스트 기본 전처리 진행
- 특수문자 제거
- corpus의 문장 길이 10분위수 ~ 95분위수만 사용
- 2점 이하: 부정, 5점: 긍정으로 나누어 label 비율을 최대한 반반으로 설정
data_load.py: 데이터 불러오기
Jaso_mapping_utils.py: 텐서에서 자소를 onehot vector로 변환
- 자소 단위로 input을 받는 모형에 대해 적용
make_VocabularyProcessor.py: 텐서에서 단어를 index로 변환
- 제공하는 VocabularyProcessor를 사용하여 객체 생성
utils.py: layers, batch generate, tokenizer 등 사용할 함수 정의

character level CNN text classifier

character level RNN text classifier

word level RNN text classifier

word level RNN text classifier with attention

Text_RNN_word_attention_config.py:
- model과 관련한 hyper-parameter 정의
Text_RNN_word_attention_model.py:
- model class
Text_RNN_word_attention_train.py:
- 모형을 학습하고, tensorboard로 summary를 확인 및 학습된 파라미터 저장
Text_RNN_word_attention_predict.py:
- 학습된 모형에 test data를 적용해 성능지표 산출

word level RCNN text classifier with attention

Text_RCNN_word_attention_config.py:
- model과 관련한 hyper-parameter 정의
Text_RCNN_word_attention_model.py:
- model class
Text_RCNN_word_attention_train.py:
- 모형을 학습하고, tensorboard로 summary를 확인 및 학습된 파라미터 저장
Text_RCNN_word_attention_predict.py:
- 학습된 모형에 test data를 적용해 성능지표 산출