/AuthorStyle

Stylometry in the Modern Era: Coreference and Voice for Authorship Attribution

Primary LanguagePython

Stylometry in the Modern Era: Coreference and Voice for Authorship Attribution

Yong Sik Cho, Samuel Sharpe, Harry Smith

Virtual Environment

  • Python 3.6
pip3 install virtualenv
cd AuthorStyle
virtualenv -p python3 venv
source venv/bin/activate

Install Requirements

pip install -r requirements.txt

  • spaCy 2.0.12 (for compatibility with neuralcoref)
  • neuralcoref
git clone https://github.com/huggingface/neuralcoref.git
cd neuralcoref
pip install -e .
cd ..
rm -r neuralcoref
  • spaCy and neuralcoref models
./download_models.sh

Files

Document.py - Document class that performes spacy processing and produces features by document

Corpus.py - Corpus class that aggregates documents and creates feature matricies

compile_corpus.py - Creates corpus for Guardian Dataset

models.py - Containes model classes for running experiments

experiments.py - Runs author attribution experiments on feature sets

summary_results.py - Summarize accuracy result comparisons.