/LaTeCH-CLfL-2019-GreekClassification

Replication code for Gianitsos et al., "Stylometric Classification of Ancient Greek Literary Texts by Genre," LaTeCH-CLfL 2019

Primary LanguagePythonMIT LicenseMIT

Genre Classifier

We are data mining a corpus of ancient texts to train machine learning classifiers that distinguish between different genres.

Setup (Instructions for Mac)

Open the Terminal app

Check if you have Python 3.6 installed:

which python3.6

If it is installed, this command should have output a path. For example: /Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6. If nothing was output, download Python 3.6 here: https://www.python.org/downloads/release/python-368/

Ensure that you have the Xcode command-line tools installed on your Mac by running the following:

xcode-select --install

If you are prompted with a dialog box, then select Install.

Check that you have brew installed:

which brew

If it is installed, this command should have output the following path: /usr/local/bin/brew. If nothing was output, install brew with the following command:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Install pipenv:

brew install pipenv

If pipenv had already been installed in the past, you may have to run brew reinstall pipenv.

(Optional) Set environment variable by executing the following lines (which will modify ~/.bash_profile). This should only ever need to be done once.

echo "#When pipenv makes a virtual environment, it will create it in the same directory as the project instead of ~/.local/share/virtualenv/" >> ~/.bash_profile
echo "PIPENV_VENV_IN_PROJECT=true" >> ~/.bash_profile
echo "export PIPENV_VENV_IN_PROJECT" >> ~/.bash_profile

Close terminal, then repoen terminal

Clone this repository - click on green 'clone' button on right side of github webpage for this repo to copy the link:

git clone <link you just copied>

Navigate inside the project folder:

cd <the project folder you just cloned>

Create/Enter virtual environment:

pipenv shell

Install dependencies:

pipenv install

Run the demo (this does a feature extraction for a small sample of files, and analyzes the results in one step):

python demo.py

Extract features from all files:

python run_feature_extraction.py all_data.pickle

Extract features from only drama and epic files:

python run_feature_extraction.py drama_epic_data.pickle drama epic

Run all model analyzer functions on the data from all files to classify prose from verse:

python run_ml_analyzers.py all_data.pickle labels/prosody_labels.csv all

Run all model analyzer functions on the data from only drama and epic files to classify drama from epic:

python run_ml_analyzers.py drama_epic_data.pickle labels/genre_labels.csv all

To leave the virtual environment, use

exit

To start the virtual environment again, use

pipenv shell