Course paper for the 2019/20 academic year at DH Masters, HSE.
The ultimate goal of this paper is to analyse whether any sentiment analysis is applicable to Russian Drama Corpus. Apart from problems of sentiment analysis as a task itself, there also are several issues when trying to apply it to dramatic text. With this research, I want to find out whether it would make any sense at all to use some of the popular instruments with Russian drama.
I'm going to test several approaches and answer the following questions:
-
how different lexicons will perform on our material?
-
if we ask people to perform manual annotation, how different would it be?
-
how difficult will it be to design a machine learning model to get what we need?
Content | File(s) | Additional |
---|---|---|
downloading data for the experiments | get_data.py, preprocessing.py | thanks DraCor API! |
experiment 1: Russian WordNet package for Python | wiki_ru_wordnet.ipynb | |
experiment 2: out-of-the-box solution, dostoyevsky | dostoyevsky.ipynb | |
preparing experiment 3: using sentiment lexicons for Russian to parse plays | extract_sentiment_from_plays.py | |
experiment 3: analyzing performance of various Russian lexicons of different origins | lexicons.ipynb | |
diving into emotional lines: most frequest items and word clouds | emotional_lines.ipynb |
Available as a .bib file: emotions_drama_literature.bib.
Lexicon name | Year developed | Article description | Dataset link |
---|---|---|---|
ProductSentiRus | 2012 | Extraction of Russian Sentiment Lexicon for Product Meta-Domain | |
EmoLex | 2013 | Crowdsourcing a Word-Emotion Association | link |
Chen-Skiena's Lexicon | 2014 | Building Sentiment Lexicons for All Major Languages | link |
LinisCrowd | 2016 | An Opinion Word Lexicon and a Training Dataset for Russian Sentiment Analysis of Social Media | link |
RuSentiLex | 2017 | Creating a General Russian Sentiment Lexicon | .txt file |