CS 51 Final Project: Part-of-Speech Tagger and Highlighter Contributors: Billy Janitsch, Jenny Liu, Yuechen Zhao, Joy Zheng ----- COMPILATION AND RUNTIME INSTRUCTIONS ----- To build code, go into highest level in repository and run: javac -classpath . *.java NOTE: Due to multi-catch exception handling, a JDK of version 7 (i.e. 1.7) or higher is required. To execute with the GUI, run: java POSTaggerApp (NOTE: The tagger only tags complete, i.e. period-terminated, sentences. Trailing words will not be tagged.) To test accuracy using the current datafile, tagset, and simplified tagset in the folder, run: java POSTaggerApp test <directoryname> To view parts of speech without the GUI, run: java POSTaggerApp view <and then some text here that you want to see tagged.> (NOTE: the text to be tagged must be period-terminated.) To create a datafile from the command line, run: java POSTaggerApp compute <tagset location> <simple tagset location> <corpus directory> <datafile save location> ----- DEBUGGING ----- In particular, the corpus directory may NOT contain any non-corpus files. If it contains any files other than corpus files, you will be prompted for a new corpus directory. Check for hidden files in your corpus directory. To do so, add the -a flag to ls in terminal. Specifically, for Mac OS users, the Mac Finder has the tendency to create a file named ".DS_Store" in the corpus directory. Remove this file with terminal if this is the case. ----- DEFAULT OR INCLUDED FILE NAMES (see javadoc for details) ----- Files modified during runtime: datafile.txt : The file of compiled probabilities corpus_tagset.txt : The list of tags for Viterbi corpus_simple_tagset.txt : A list mapping tags in the full tagset to a smaller list of tags for highlighting Included default files: /corpus : a directory containing the Brown Corpus /backup : these are default backup files for use in case of user error. Any changes to these files is liable to cause operation problems with the program. webster_ dictionary.txt : an English-language dictionary based on the one found at http://www.gutenberg.org/files/29765/29765-8.txt thesaurus.txt: an Engligh-language thesaurus based on the one found at http://www.gutenberg.org/cache/epub/10681/pg10681.txt