/russian-stage-classification

Bachelor's thesis at NRU HSE: Short Text Classification: A Case of Stage Directions in Russian Drama

Primary LanguageJupyter Notebook

Short Text Classification: a Case of Stage Directions in Russian Drama

This is a repository for BA thesis written in School of Linguistics, NRU HSE (Moscow, RU), throughout the 2018/19 academic year.

UPD: all notebooks are now available in Binder! Click the badge to proceed:

Binder

What is this about?

I already did a project on Russian Drama Corpus and its stage directions — it can be found here: Stage Directions in Russian Drama. It dealt more with the quantitative part of the research and exploring corpus trends; at this one, I want to focus on the part which deals more with computational linguistics and machine learning, that is to extract linguistic features and run several different models.

Russian Drama Corpus (or shortly, RusDraCor) can be found at dracor-org/rusdracor; it is also available in a more user-friendly format at Dracor website.

What's inside?

Code

Content Notebook Additional
retrieving and downloading data api-data-preprocessing.ipynb dracor_api.py, file_work.py
annotation description annotation_guide.md
morphology, NER, stopwords, etc. linguistic-features.ipynb
semantics hypothesis + test on 2018 data semantic-rules.ipynb
working with the final dataset dataset-separation.ipynb
model fitting: entrance and exit fitting-semantic-types.ipynb data_preparation.py, model_fitting.py, separate semantic class to come
model fitting: other types fitting-nonsemantic-types.ipynb data_preparation.py, model_fitting.py

Original paper

Is also here in the repo: pdf

Slides

TEI 2019 (Sep 20, 2019): Using Machine Learning for the Automated Classification of Stage Directions in TEI-Encoded Drama Corpora

Thesis defence (Jun 17, 2019): Short Text Classification: a Case of Stage Directions in Russian Drama

Important dates

Module Event Date
3rd Project Proposal presentation March 26, 2019
4th Written paper deadline June 4, 2019
4th Final thesis presentation June 18, 2019