Taking as input a basic text file, analyze frequency of ordered phrases (ranked based on number of words per phrase), i.e. most common 3 word phrase
Future development suggestions:
-
Send output to file instead of to console for more flexibility
-
Add HTML parsing for analyzing displayed text on webpages
-
Add comparison option/parameter for arbitrary phrase length comparisons, ex. compare the most common 3-word phrase against most common 4-word phrase, in the same file
-
Add comparison functionality between files, ie. most common 3-word phrases from each of two (or more files)
-
Combine features (2) and (3): Advanced interfile comparisons
-
Advanced linguistic parsing to ignore extremely common words, ex. "the", "a", "it's", to allow for more advanced farming of significant phrases