We are data mining a corpus of ancient texts to train machine learning classifiers that distinguish between different genres.
Open the Terminal app
Check if you have Python 3.6
installed:
which python3.6
If it is installed, this command should have output a path. For example: /Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6
. If nothing was output, download Python 3.6
here: https://www.python.org/downloads/release/python-368/
Ensure that you have the Xcode command-line tools installed on your Mac by running the following:
xcode-select --install
If you are prompted with a dialog box, then select Install
.
Check that you have brew
installed:
which brew
If it is installed, this command should have output the following path: /usr/local/bin/brew
. If nothing was output, install brew
with the following command:
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Install pipenv
:
brew install pipenv
If pipenv
had already been installed in the past, you may have to run brew reinstall pipenv
.
(Optional) Set environment variable by executing the following lines (which will modify ~/.bash_profile
). This should only ever need to be done once.
echo "#When pipenv makes a virtual environment, it will create it in the same directory as the project instead of ~/.local/share/virtualenv/" >> ~/.bash_profile
echo "PIPENV_VENV_IN_PROJECT=true" >> ~/.bash_profile
echo "export PIPENV_VENV_IN_PROJECT" >> ~/.bash_profile
Close terminal, then repoen terminal
Clone this repository - click on green 'clone' button on right side of github webpage for this repo to copy the link:
git clone <link you just copied>
Navigate inside the project folder:
cd <the project folder you just cloned>
Create/Enter virtual environment:
pipenv shell
Install dependencies:
pipenv install
Run the demo (this does a feature extraction for a small sample of files, and analyzes the results in one step):
python demo.py
Extract features from all files:
python run_feature_extraction.py all_data.pickle
Extract features from only drama and epic files:
python run_feature_extraction.py drama_epic_data.pickle drama epic
Run all model analyzer functions on the data from all files to classify prose from verse:
python run_ml_analyzers.py all_data.pickle labels/prosody_labels.csv all
Run all model analyzer functions on the data from only drama and epic files to classify drama from epic:
python run_ml_analyzers.py drama_epic_data.pickle labels/genre_labels.csv all
To leave the virtual environment, use
exit
To start the virtual environment again, use
pipenv shell