/WordSenseDisambiguation

Class Project - Disambiguating words that have context-sensitive meanings. I.e., "bank" as a financial institution, or the side of a river.

Primary LanguagePostScript

Word Sense Disambiguation

By Gary Chen and Ryan Kwon


Instructions:

The interface for interacting with the project is primarily driven through the shell (.sh) files. These have generally been configured for UNIX machines, but should also work for Windows machines.

In order to run the project, these commands should be executed in terminal:

sh compile_all.sh -> Compiles all Java files, setting the classpath appropriately to point at the bin and jar folder.

sh run_program.sh -> Reads in a subset of the Guardian data set and creates a data file data.ser. This process may take 3-4 minutes. Once data.ser exists however, run_program.sh will read and execute in under a minute. This will allow you to interact with the program after selecting a scoring method of your choice. This author strongly recommends Sentence Match.

You will then be asked to enter in a sentence as well as specifying the ambiguous word (w) in the sentence. The program will then retrieve the top 10 sentences from the training corpus that appear to use w in the same sense.

sh run_program_with_dict.sh -> Reads in a subset of the Guardian data set, but retrieves definitions from WordNet rather than the Guardian text.


If you wish to run the TestData set, be sure to delete your local copy of data.ser. As data.ser will likely be serialized for the Guardian text, your program will not read in the TestData set.