Class: CS6601 - Artificial Intelligence (FA12) Professor: Thad Starner Submission Date: 2012/11/29 Team Member: Niklas Ulvinge ulvinge@gmail.com Peter Vieira pete.vieira@gmail.com DIRECTORIES ###### ngrams ###### This directory contains the ngram files generated by the nGrams.py script. The contain one ngram per line with its frequency score in the corpus, tab-delimited. #### test #### This directory contains the GPL license texts used in the testing algorithms. SCRIPTS ######### nGrams.py ######### This script runs through the brown corpus in the nltk package and creates 1-, 2-, 3-, 4- and 5-gram files that contain all the ngrams in the database, with one per line tab-delimited with it's frequency score in the databasei. These files are saved to the 'ngrams'directory. ######## spell.py ######## This script checks the document specified at the bottom of the script for errors and generates a file with correction sentences for every sentence in the document. ####### test.py ####### This script compares two documents using word error rate (WER) and gives distance score from 0 to 1, with 0 being completely correct. ########### distance.py ########### This script takes two strings and compares them using the Damerau-Levenshtein algorithm and returns a distance value. ########### nlputils.py ########### This script contains all the nlp utility functions used in the other scripts. ######### scorer.py ######### This script generates a confusion set for each ngram in a list of sentences using the generate() function. It also contains a function called getScore() which gets the score for each ngram. ############ browntest.py ############ This script selects random sentences in the Brown corpus in the nltk and randomly inserts one error in each sentence. It then corrects the document with errors and scores the result, and then compares it to the original.