Zach O'Brien
The objective of this project is to generate new poetry based on Walt Whitman's Leaves of Grass. To do so, two types of language models are used:
- N-gram (2 and 3 gram) language models
- LSTM character-based language model
Name | Purpose |
---|---|
data/ | Holds raw base data, and intermediate derived datasets used for model training |
scripts/ | Scripts to run steps in the processing pipeline like extracting poems from raw data or training a model. These are largely glued together in poetry-generation.ipynb so that the project can be completely re-created from that notebook. |
src/ | Modular, reusable code which is shared among scripts/ , test/ , and poetry-generation.ipynb |
test/ | Unit tests for code in src/ |
poetry-generation.ipynb | A jupyter notebook which presents the project's work and final product. It can be run to reproduce the entire project from only the raw data file. |
requirements.txt | External packages required by code in this project |
This project's dependencies are specified in a requirements.txt
(and requirements-apple-silicon.txt
) file, intended for use with Python's built-in venv
virtual environment tool.
This project uses Python version 3.9.10. You can attempt to install the packages and run the code with a different version of Python and it might work, but using version 3.9.10 is probably best.
-
Install Python version 3.9.10, and use that version for the following steps
-
Create a new virtual environment for this project
python3 -m venv env
-
Activate the virtual environment
# On windows: env\Scripts\activate.bat
# On Unix or MaxOS: source env/bin/activate
-
Install dependencies
On Apple silicon:
# With the env virtual environment activated: python -m pip install -r requirements-apple-silicon.txt
On all other platforms, including intel-based macs:
# With the env virtual environment activated: python -m pip install -r requirements.txt
-
Install prerequisite Natural Language Toolkit (NLTK) data. Note that this will create a new directory
nltk_data/
in your home directory (on Linux and MacOS) in which to insall the data.# With the env virtual environment activated: python -m nltk.downloader punkt
-
Install this project's modular source code. This step is critical. If skipped, imports will not work.
# With the env virtual environment activated: cd src/ # Now, in src/ directory: python -m pip install -e .
-
Verify the installation was succesful by running the unit test suite
# In top-level project directory python -m pytest test/
Steps 7 and 8 are only required if you wish to run the Jupyter Notebook
-
Create an
ipykernel
kernel so that the jupyter notebook can access the virtual environment# With the env virtual environment activated: python -m ipykernel install --user --name=env
-
Open Jupyter Lab and navigate to
env.ipynb
# With the env virtual environment activated: jupyter-lab
First, activate the env
virtual environment. Then:
python -m pytest test/