/author-id

Author identification of two separate authors using Project Gutenberg source material.

Primary LanguageJupyter Notebook

Author identification

Serves as practice for Intro to Data Science (DSE I1020) final. Can we correctly classify lines of text from the plays of Ben Jonson and William Shakespeare? Using a Logistic Regression classifer, we pull in 4 plays from each author from Project Gutenberg and split into lines. These are then vectorized and used to train and test the classification model.