Exercise 6: Files and dicts

Files can also be iterated line-by-line, using a for loop on the file directly.

For example:
    twain = open('twain.txt')
    for line in twain:

        # Do something

The loop variable 'line' will store each line of the file in turn.

Dictionaries can also be iterated entry-by-entry, using the method iteritems().

For example:
    my_dict = {'a': 1, 'b': 2, 'c': 3}
    for key, value in my_dict.iteritems():
        print "Key == %r, value == %r" % (key, value)+--********

Key == 'a', value == 1
Key == 'b', value == 2
Key == 'c', value == 3

This introduces two loop variables, 'key' and 'value', that will store the key
and value elements of each dictionary entry in turn.

    * http://learnpythonthehardway.org/book/ex39.html
    * http://www.learnpython.org/page/Dictionaries
    * http://docs.python.org/library/stdtypes.html#string-methods
    * http://docs.python.org/library/stdtypes.html#mapping-types-dict

Problem Description

Write a program, wordcount.py, that opens a file named on the command
line and counts how many times each space-separated word occurs in
that file. Your program should then print those counts to the
screen. For example:

    As I was going to St. Ives
    I met a man with seven wives
    Every wife had seven sacks
    Every sack had seven cats
    Every cat had seven kits
    Kits, cats, sacks, wives.
    How many were going to St. Ives?

    $ python wordcount.py inputfile.txt
    seven 4
    Kits, 1
    sack 1
    As 1
    kits 1
    Ives? 1
    How 1
    St. 2
    had 3
    sacks, 1
    to 2
    going 2
    was 1
    cats, 1
    wives 1
    met 1
    Every 3
    with 1
    man 1
    a 1
    wife 1
    I 2
    many 1
    cat 1
    Ives 1
    sacks 1
    wives. 1
    were 1
    cats 1

You may find the following methods useful:

We have provided a file 'twain.txt' for you to test your code on.

Extra Credit

The output of your program is not as nice as it could be. Try to improve it:

    * Some words are counted separately due to punctuation. Remove punctuation
      so that they appear as the same word in the output.

    * In the example above, 'Kits' and 'kits' are treated separately because they
      have different capitalization. Make all words lowercase so that
      capitalization doesn't matter.

    * Sort the output from the highest frequency words to the lowest frequency

    * Sort words having the same frequency alphabetically.