Trumango is a basic question answering system based on pattern matching.
Trumango gets question sentences from file which has one question sentence per line and gets text which will be used for searching answers, from another file. Text must be written in one line.
With an apathetic shrug, what does Truman replace?
He picks up the framed picture of his wife from where?
The sound of the children triggers what in his head?
...
Questions file set to questions.txt
and text file set to the_truman_show_script.txt
by default.
Question and text files can be specified using flags.
Show usage and flag descriptions: ./trumango --help
./trumango -q my_questions.txt -t my_text.txt
nlp
module contains following functionalities for text processing.
Stem
wraps stemming functionality of porter2 stemmer.Stem
finds the stems of each word using porter2 stemmer then assembles the sentence again.ClearStopWords
uses the stopwords library for clearing stop words.SplitSentences
uses the segmentation functionality of prose to split a text into sentences accurately.
horspool
module contains following functionalities for pattern matching with Horspool Algorithm which has O(n) time complexity in average.
Find
finds the index of first matching pattern in text using horspool algorithm.FindLast
finds the index of last matching pattern in text using horspool algorithm.
util
module contains Difference
function which finds the difference of given string array A from given string array B.
Trumango loads specified files, then splits the text into sentences by using prose. Prose has very extensive and advanced features such as tokenizing a sentence and tagging each word as verb, noun etc. but in this project prose just used for splitting the text into sentences more accurately.
After the processing of input, for each question, corresponding question's stop words cleared and stemmed for searching in the sentences. When clearing and stemming process of question is done, question splitted into words and each word searched in text sentences to find most matched sentence.
When finding answer sentence process is done, by using question and answer sentence map, exact answers will be found. For this process, sentence is cleared and stemmed but for retrieving the original version of word from stemmed word, a map of stemmed words and original words constructed while stemming operation. After this process, by getting the difference of cleared and stemmed sentence words from cleared and stemmed question words, we end up with exact answer words.
The sound of the children triggers what in his head?
sound children trigger head
The sound of the children triggers a memory in his head.
sound children trigger memori head
memori
memory