/pytorch-sentiment-analysis-kor

Sentiment analysis model implementation using PyTorch and torchtext with Korean corpus

Primary LanguagePython

Sentiment Analysis PyTorch implementations

This repo contains various sequential models used to classify sentiment of sentence.

Base codes are based on this great sentiment-analysis tutorial.

In this project, I specially used Korean corpus NSMC (Naver Sentiment Movie Corpus) to apply torchtext into Korean dataset.

And I also used soynlp library which is used to tokenize Korean sentence. It is really nice and easy to use, you should try if you handle Korean sentences :)


Overview

  • Number of train data: 105,000
  • Number of validation data: 45,000
  • Number of test data: 50,000
  • Number of possible class: 2 (pos / neg)
Example:
{
  'text': '['액션', '이', '없는', '데도', '재미', '있는', '몇안되는', '영화'], 
  'label': 'pos'
}

Requirements

  • Following libraries are fundamental to this repo. Since I used conda environment requirements.txt has much more dependent libraries.
  • If you encounters any dependency problem, just use following command
    • pip install -r requirements.txt
numpy==1.16.4
pandas==0.25.1
scikit-learn==0.21.3
soynlp==0.0.493
torch==1.2.0
torchtext==0.4.0

Models


Usage

  • Before training the model, you should train soynlp tokenizer on your training dataset and build vocabulary using following code.
  • By running following code, you will get tokenizer.pickle, text.pickle and label.pickle which are used to train, test model and predict user's input sentence
python build_pickle.py
  • For training, run main.py with train mode (which default option)
python main.py --model MODEL_NAME
  • For testing, run main.py with test mode
python main.py --model MODEL_NAME --mode test 
  • For predicting, run predict.py with your Korean input sentence.
  • Don't forget to wrap your input with double quotation mark !
python predict.py --model MODEL_NAME --input "YOUR_INPUT"

Example

[in]  >> 노잼 뻔한 스토리 뻔한 결말...
[out] >> 0.84 % : Negative

[in]  >> 마음도 따뜻.마요미의 진가. 그리고 감동. 뭐 힐링타임용으로 무난한 가족영화탄생~^^
[out] >> 97.64 % : Positive

[in]  >> 클리쉐 덩어리 예산도 적게들었을듯 한데 마지막 관중조차 CG
[out] >> 26.68 % : Negative

  • You can test trained model using following code
curl -X POST https://us-central1-nlp-api-252209.cloudfunctions.net/sentiment 
 -H 'Content-Type:application/json' 
 -d '{"input":"YOUR INPUT IN KOREAN"}