wooster

ask-wooster. Not as smart as asking Jeeves, but we'll give it an honest go!

Installation

The project uses the GNU-Make commands for housekeeping. Please have both GNU Make and Python 2.7 installed and available in the system PATH. Additionally, please have the virtualenv python package installed.

Unzip the documents into the resources/doc_dev folder
make ask-wooster and make check-wooster for dev-set
make test-wooster for test-set
Note that make preprocess and make preprocess-test must be done before the corresponding answering steps.

Workflow

Preprocess all answers in corpus to POS tag and NER-class augmented list of sentences
Read questions file into memory and parse into tokens
For each question, read the entire 100-document folder into memory and parse into a list of paragraphs (which in turn have sentences/tokens) with feature annotations from preprocessing
Extract candidate answers from window around tokens with the NER-type we're looking for
Compute the ranking-score of the candidate answers
For the top 5 answers, create an answer object consisting of the question id, text, and answer-document id
Write out the ranked list of answers in the format specified in the instructions, to provided answer file

Notable Points

Removes function words from corpus when considering relevance
Perform Named Entity Recognition (NER) to extract intelligent guesses of answers
Additional custom NER-like detection of time-types

References

Got the list from http://www.myenglishpages.com/site_php_files/vocabulary-lesson-function-words.php
Meaning of Penn-Treebank POS from https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

WlFranzi/wooster

wooster

Installation

Workflow

Notable Points

References