- Just download 'ksenticnet_kaist.py' file :)
- There are several Korean sentiment analysis resources such as KNU SentiLex, KOSAC.
- However, sentiment lexicons like them require a lot of time and human resources.
- So I decided to make it easier and automated by combining SenticNet and KAIST Korean wordnet(KWN).
- You can get words' sentic values, sentiments, polarity value and semantics.
- I recommend you to use it with POS tagger(such as Kkma).
- It follows major process of CSenticNet.
- But it resolved duplicated sentic value problem on Korean and English word.
- Make {english word : synsets} dictionary through KWN.
- Direct mapping ( Compare each synset's hypernyms to semantics in SenticNet words and find pair )
- Apply Lesk algorithm to the non-matched words in SenticNet.
- During 2, 3 there are synsets which get several different sentic values. Apply weighted average on sentic values based on AffectNet frequencies.
- For Korean words, assign the sentic value which was assigned on the synset.
- During 5, there are synsets which have only one Korean word. For those, use weighted average sentic value same as process 4.
- During 5, several Korean words are assigned different sentic values but we cannot use weighted average because each synset contains multiple Korean words. So compute average cosine similarity * of synsets for that Korean word and use only the most adequate synset to give sentic value.
* Cosine similarity is computed from Korean tuned-embedding vectors. The vectors of Korean words are tuned by Context2Vec structure from facebook Fasttext. In this structure, I scraped example sentences for target words from several dictionaries. While applying Bi-LSTM, Self-Attention, Neural Tensor Network, pre-trained Fasttext vectors are modified and adjusted. By using these tuned vectors we can compute cosine similarities among other Korean words in a synset and use average similarity as an index of 'adequacy'.
- SenticNet5
- CSenticNet
- Korean Fasttext word embeddings
- KAIST Korean Wordnet
- 연세 한국어 사전, 표준 국어 대사전, 고려대 한국어 대사전, 우리말샘
- We can assign sentic value to 5465 Korean words.
- Validate it through 1000 positive reviews and 1000 negative reviews in NAVER movie review corpus ( simple count after tokenizing by Kkma )
- Precision: 52.87% | Recall: 85.4% | F1: 65.31%