/CS6120-Assignment-1

Naive Bayes, Language Models, WSD and HMM POS Tagging

Primary LanguageJupyter Notebook

CS6120-Assignment-1

README

Please run the anaconda version of jupyter notebooks to run this file

Ensure the following folders exist:

brown brownmeta gutenberg imdb_data news_data

Instructions for data extraction:

  • brown
    • Extract the browncopy_2018.zip into this folder
  • brownmeta
    • The science_sample.txt file needs to be in here
    • Folder is populated by a number of temporary files and outputs, including several deliverables
      • bigram counts
      • unigram counts
      • word-tag counts
      • generated sentences from HMM
      • tagged science sentences
  • gutenberg
    • Simply extract gutenberg data here
  • imdb_data
    • extract imdb_data.zip here
  • news_data
    • extract news_data.zip here

Deliverables

Several deliverables have already been mentioned in the brownmeta folder. All will be generated after running the code. Several other deliverables like counts are in the root of the Assignment1 folder. Namely:

  • gutenberg4-grams.txt
  • imdb-ngrams.txt
  • news-4grams.txt

Most other files are just cache files to improve repeat performance of algs.

Finally, the JupyterNotebook and associated PDF output are present.

This repository is also available on GitHub in a private repository Email c.dilger1@gmail.com for access to the repository.