/rus-dra-emotions

Measuring Sentiment in Russian Drama: a course paper for the 2019/20 academic year at DH Masters, HSE

Primary LanguageJupyter Notebook

Evaluating Sentiment in Russian Drama

Course paper for the 2019/20 academic year at DH Masters, HSE.

About

The ultimate goal of this paper is to analyse whether any sentiment analysis is applicable to Russian Drama Corpus. Apart from problems of sentiment analysis as a task itself, there also are several issues when trying to apply it to dramatic text. With this research, I want to find out whether it would make any sense at all to use some of the popular instruments with Russian drama.

I'm going to test several approaches and answer the following questions:

  1. how different lexicons will perform on our material?

  2. if we ask people to perform manual annotation, how different would it be?

  3. how difficult will it be to design a machine learning model to get what we need?

Contents

Content File(s) Additional
downloading data for the experiments get_data.py, preprocessing.py thanks DraCor API!
experiment 1: Russian WordNet package for Python wiki_ru_wordnet.ipynb
experiment 2: out-of-the-box solution, dostoyevsky dostoyevsky.ipynb
preparing experiment 3: using sentiment lexicons for Russian to parse plays extract_sentiment_from_plays.py
experiment 3: analyzing performance of various Russian lexicons of different origins lexicons.ipynb
diving into emotional lines: most frequest items and word clouds emotional_lines.ipynb

Literature

Available as a .bib file: emotions_drama_literature.bib.

Lexicons used in the experiment

Lexicon name Year developed Article description Dataset link
ProductSentiRus 2012 Extraction of Russian Sentiment Lexicon for Product Meta-Domain
EmoLex 2013 Crowdsourcing a Word-Emotion Association link
Chen-Skiena's Lexicon 2014 Building Sentiment Lexicons for All Major Languages link
LinisCrowd 2016 An Opinion Word Lexicon and a Training Dataset for Russian Sentiment Analysis of Social Media link
RuSentiLex 2017 Creating a General Russian Sentiment Lexicon .txt file