CS6120-Assignment-1: A Jupyter Notebook repository from cdilga

CS6120-Assignment-1

README

Please run the anaconda version of jupyter notebooks to run this file

Ensure the following folders exist:

brown brownmeta gutenberg imdb_data news_data

Instructions for data extraction:

brown
- Extract the browncopy_2018.zip into this folder
brownmeta
- The science_sample.txt file needs to be in here
- Folder is populated by a number of temporary files and outputs, including several deliverables
  - bigram counts
  - unigram counts
  - word-tag counts
  - generated sentences from HMM
  - tagged science sentences
gutenberg
- Simply extract gutenberg data here
imdb_data
- extract imdb_data.zip here
news_data
- extract news_data.zip here

Deliverables

Several deliverables have already been mentioned in the brownmeta folder. All will be generated after running the code. Several other deliverables like counts are in the root of the Assignment1 folder. Namely:

gutenberg4-grams.txt
imdb-ngrams.txt
news-4grams.txt

Most other files are just cache files to improve repeat performance of algs.

Finally, the JupyterNotebook and associated PDF output are present.

This repository is also available on GitHub in a private repository Email c.dilger1@gmail.com for access to the repository.

cdilga/CS6120-Assignment-1

README

Deliverables