/Comment-Sentiment-Analysis

Comment Sentiment Analysis using Deep Learning

Primary LanguagePython

Comment-Sentiment-Analysis

Comment Sentiment Analysis using Deep Learning

πŸ“Œ Author : Minku Koo

πŸ“Œ Project Period : Dec/2020 ~ Jan/2021

πŸ“Œ Contact : corleone@kakao.com

πŸ“Œ Main Library : tensorflow, keras, KoNLPy

πŸ“Œ Keyword : "Sentiment Analysis", "Machine Learning", "Korean", "Deep Learning"

πŸ“ƒ Table of Contents

1. Scrapping Comment Data

  • Python Crawler : ./python-code/comment_crawling.py
  • Target Place : Naver, Daum News Comment
  • Scrapped Data : Comment, Replay, Article Date (+ Title, Content)
  • News Searching Keyword : "기독ꡐ", "뢈ꡐ", "천주ꡐ", "μ‹ μ²œμ§€", "쒅ꡐ"
  • Data Saved Place : Database (MariaDB)
  • Database Data to Text file - path : ./comment/raw-comment/

πŸ” Scrapping Period per Religion

검색 ν‚€μ›Œλ“œ μˆ˜μ§‘ μ‹œμž‘ κΈ°κ°„ κΈ°μ€€ λ‚ μ§œ μˆ˜μ§‘ μ’…λ£Œ κΈ°κ°„
μ‹ μ²œμ§€ 19.09.17 20.02.17 20.07.18
기독ꡐ 19.08.20 20.01.20 20.10.20
천주ꡐ 19.08.20 20.01.20 20.08.20
뢈ꡐ 19.08.20 20.01.20 20.08.20
쒅ꡐ 19.08.20 20.01.20 20.10.10

πŸ” Scrapped Data Result

검색 ν‚€μ›Œλ“œ 이전 κΈ°κ°„ 이후 κΈ°κ°„
Article Comment Article Comment
μ‹ μ²œμ§€ 211 22,658 2,974 262,840
기독ꡐ 1,771 94,405 1,186 85,443
천주ꡐ 1,899 37,010 1,685 56,881
뢈ꡐ 833 6,465 420 7,585
쒅ꡐ 1,939 52,527 2,373 122,206

2. Labeling Comment Data

  • path : ./train-data/
  • Comment Human Inspection : ./train-data/comment-labeling.csv
  • Naver Movie Review Data : naver-ratings.csv
  • ( Data from Here )

3. Using KoNLPy Okt

Text Data Preprocessing

okt.pos(comment)
remove 'Josa', 'Punctuation', 'Number'
save path : ./comment/after-okt-comment/

4. Build Deep Learning Network using Keras

  • Python File Name : ./python-code/make_rnn_model.py
  • Train Data path : ./train-data/
  • Crawled Comment + Naver Movie Reivew => Transfer Learning
  • Comment text data convert to Vector (using TextVectorization)
  • Accuracy : 0.95
  • Val Accuracy : 0.83

5. Predict Sentiments Value

  1. Make json file -> dict[date][article] = [[comment list],[]]
  2. Every Comment Labeling using Deep Learning Model
  3. Update json file / dict[date][article] = [[comment list],[sentiment value list]] (path: ./comment/json-okt-comment)
  4. Calculate sentiment value per date
    • each Article sentiment : Weight Average (article comment count / date comment count)
    • each Date sentiment : using IMDb's rating system

6. RESULT (Make Graph)

πŸ“ Average, Standard Deviation / Religion

검색 ν‚€μ›Œλ“œ 이전 κΈ°κ°„ 이후 κΈ°κ°„
평균 ν‘œμ€€ 편차 평균 ν‘œμ€€ 편차
μ‹ μ²œμ§€ 0.381 0.412 0.313 0.388
기독ꡐ 0.310 0.372 0.276 0.371
천주ꡐ 0.375 0.405 0.284 0.377
뢈ꡐ 0.356 0.392 0.272 0.369
쒅ꡐ 0.313 0.376 0.271 0.367

πŸ“ Sentiment Average stick graph / Religion

(path : ./result-graph/emotion-average-stick/)

πŸ“ Sentiment time flow graph

(path : ./result-graph/emotion-flow/)

  • Before COVID19 : green
  • After COVID19 : red
  • y axis
    • close to 1 : Positive
    • close to 0 : Negative

      βœ” 천주ꡐ

βœ” 쒅ꡐ

πŸ“ All Comment Count per Month / Religion

(path : ./result-graph/comment-count/)

πŸ“ WordCloud / Religion

(path : ./result-graph/word-cloud/)

βœ” Before COVID19, 기독ꡐ

βœ” After COVID19, 기독ꡐ

πŸ“ Top 30 Word / Religion

(path : ./result-graph/word-cloud/)

βœ” Before COVID19, μ‹ μ²œμ§€

βœ” After COVID19, μ‹ μ²œμ§€