Stage Directions in Russian Drama

Stage directions, quite literally, don't count.

In: Hardin L. Aasand (ed.): Stage Directions in Hamlet. New essays and new directions. Madison et al. 2003, p. 226.

What is this all about?

This is a repo with the code to my 3rd year coursework. Its title is Linguistic Analysis of Stage Directions in Russian Drama from the 18th to the 20th Century, so it's going to be all stage directions and all linguistic :)

Check out my slides for EADH 2018 conference here; basically, they cover everything I did for this course paper.

Work objectives

Perform some neat corpus analysis on the Russian Drama Corpus.
A great result would be the classification of stage directions according to the TEI-5 markup standard. According to it, stage directions have 9 types:
- setting,
- entrance,
- exit,
- business,
- novelistic,
- delivery,
- modifier,
- location,
- mixed.

What's in the repo?

File/folder	What's inside
csv/	Comma-separated files with datasets
figures/	Figures from plot-plays.ipynb
requirements.txt	List of packages required to run the notebooks
directions-basic.ipynb	Extracting some basic information about plays
means-merged-features.ipynb	Mean POS counts, merging with another dataset
plot-plays.ipynb	Drawing different plots visualising the data we got
classification.ipynb	Classifying the directions into TEI-P5 types
frequent-pos.ipynb	Most frequent parts of speech in the corpus

Dependencies

All the dependencies are listed in requirements.txt. As a sidenote: the majority of the packages are shipped with Anaconda. If you have it installed, you'll only need to install nltk by yourself, and also to download NLTK data after that. In Python, this should be as follows:

import nltk
nltk.download()

"Roadmap" and current state-of-affairs

get the corpus from the repository
extract basic information
get mean values of different parts-of-speech
try visualize the data (because why not)
annotate some directions
do machine learning experiments: ran kNN, Decision Tree, and Random Forest in classification.ipynb.
write a paper
present the paper and the results with the handouts

Deadlines

May 18th — sharing the paper with the reviewer,
May 22nd — uploading the paper into the system,
May 25th — presentation.

Source corpus

I'm using RusDraCor. It can be explored on its site, and it's also possible to download it from its Github repository.

creaciond/russian-drama