CS6120-Assignment-1
Please run the anaconda version of jupyter notebooks to run this file
Ensure the following folders exist:
brown brownmeta gutenberg imdb_data news_data
Instructions for data extraction:
- brown
- Extract the browncopy_2018.zip into this folder
- brownmeta
- The science_sample.txt file needs to be in here
- Folder is populated by a number of temporary files and outputs, including several deliverables
- bigram counts
- unigram counts
- word-tag counts
- generated sentences from HMM
- tagged science sentences
- gutenberg
- Simply extract gutenberg data here
- imdb_data
- extract imdb_data.zip here
- news_data
- extract news_data.zip here
Several deliverables have already been mentioned in the brownmeta
folder.
All will be generated after running the code. Several other deliverables like counts are in the root of the Assignment1 folder. Namely:
- gutenberg4-grams.txt
- imdb-ngrams.txt
- news-4grams.txt
Most other files are just cache files to improve repeat performance of algs.
Finally, the JupyterNotebook and associated PDF output are present.
This repository is also available on GitHub in a private repository Email c.dilger1@gmail.com for access to the repository.