In: Hardin L. Aasand (ed.): Stage Directions in Hamlet. New essays and new directions. Madison et al. 2003, p. 226.
This is a repo with the code to my 3rd year coursework. Its title is Linguistic Analysis of Stage Directions in Russian Drama from the 18th to the 20th Century, so it's going to be all stage directions and all linguistic :)
Check out my slides for EADH 2018 conference here; basically, they cover everything I did for this course paper.
-
Perform some neat corpus analysis on the Russian Drama Corpus.
-
A great result would be the classification of stage directions according to the TEI-5 markup standard. According to it, stage directions have 9 types:
- setting,
- entrance,
- exit,
- business,
- novelistic,
- delivery,
- modifier,
- location,
- mixed.
File/folder | What's inside |
---|---|
csv/ | Comma-separated files with datasets |
figures/ | Figures from plot-plays.ipynb |
requirements.txt | List of packages required to run the notebooks |
directions-basic.ipynb | Extracting some basic information about plays |
means-merged-features.ipynb | Mean POS counts, merging with another dataset |
plot-plays.ipynb | Drawing different plots visualising the data we got |
classification.ipynb | Classifying the directions into TEI-P5 types |
frequent-pos.ipynb | Most frequent parts of speech in the corpus |
All the dependencies are listed in requirements.txt
. As a sidenote: the majority of the packages are shipped with Anaconda. If you have it installed, you'll only need to install nltk
by yourself, and also to download NLTK data after that. In Python, this should be as follows:
import nltk
nltk.download()
- get the corpus from the repository
- extract basic information
- get mean values of different parts-of-speech
- try visualize the data (because why not)
- annotate some directions
- do machine learning experiments: ran kNN, Decision Tree, and Random Forest in classification.ipynb.
- write a paper
- present the paper and the results with the handouts
- May 18th — sharing the paper with the reviewer,
- May 22nd — uploading the paper into the system,
- May 25th — presentation.
I'm using RusDraCor. It can be explored on its site, and it's also possible to download it from its Github repository.