/challenge-201604-words

The April 2016 COhPy challenge: word counts

Primary LanguagePython

The April 2016 COhPy challenge

The challenge for April is to find the 10 most common words (and how often they appeared) in the book Frankenstein by Mary Shelley. The full text is located at Project Gutenberg, here:

http://www.gutenberg.org/cache/epub/84/pg84.txt

Note that you can define "word" as you wish, however, your definition of "word" MUST be fully specified in text (code is not considered documentation) and your definition of "word" and word counts must be tested, using the Python testing framework of your choice: unittest, doctest, py.test, etc.

Bonuses:

  1. Don't use Gutenberg header information as part of your word count.

  2. Extend your code to work with any Gutenberg text. Such as, From the Earth to the Moon by Jules Verne: http://www.gutenberg.org/cache/epub/83/pg83.txt

  3. Extend your code to work with ANY text.

  4. Extend your code to print any number of the most common words, from 10 to 100 or 1000, etc.

  5. Extend your code to collect other statistics of your choice on the text.

  6. Use a graphing tool or library to graph word frequency.