Natural Language Analysis for identifying authors based on excerpts
Project Title:
Spooky Author - Predictive Author Identification
Project Outline:
Predict the author of excerpts from horror stories by Edgar Allan Poe, Mary Shelley, and HP Lovecraft.
Goals:
- Analyze sentence length, word length, word variety (vocabulary), punctuation, pronoun usage, and sentiment scores for each author in a training dataset
- Perform probability analysis for each text and predict authorship on a test dataset with no assigned authorship
- Compare analyses on training dataset to the test dataset once authorship has been assigned
Notes:
- Uses Kaggle dataset for the Spooky Author Competition (https://www.kaggle.com/c/spooky-author-identification)
- The NLTK library requires an additional download. See Interactive Installer in http://www.nltk.org/data.html for more information.
- A description of the files contained in this repo can be found in file_descriptions.txt