/HuffmanTextFileCompressor

HuffmanEncoding Text File Compressor Project in CS245:Data Structures and Algorithms at The University of San Francisco

Primary LanguageJava

Readme from SPRING16 CS245: Data Structures and Algorithms. Instructor David Galles.

For your second project, you will write a program that compresses and uncompresses files using Huffman coding. To compress a file, your program will follow the following steps:

•	Read in the entire input file, and calculate the frequencies of all characters.

•	Build a Huffman tree for all characters that appear in the input file (characters that do not appear in the input file should not appear in your Huffman tree)

•	Build a lookup table, which contains the codes for all characters in the input file

•	Check to see if the compressed file would be smaller than the original file. If not, stop -- don't do any compression. Print out a message instead that the file cannot be compressed

•	If the compressed file will be smaller, create the encoded file:

◦	Print out a "Magic Number", which will be used to guard against uncompressing files that we didn't compress

◦	Print out the Huffman tree to the output file

◦	Use the lookup table to encode the file

To uncompress a file, your program will follow the following steps:

•	Read in the "Magic Number", and make sure that it matches the number for the this program (exiting if it does not match)

•	Read in the Huffman tree from the input file

•	Decode the input, using the Huffman tree

If your program is called with the verbose'' flag (-v), you will also need to print some debugging information to standard out. If your program is called with the force'' flag (-f), then the file will be compressed even if the compressed file would be larger than the original file.

Your program should expect to be called as follows:

% java Huffman (-c|-u) [-v] [-f]  infile outfile

where:

•	(-c|-u) stands for either "-c" (for compress), or "-u"(for uncompress)

•	[-v] stands for an optional "-v" flag (for verbose)

•	[-f] stands for an optional "-f" flag, that forces compression even if the compressed file will be larger than the original file

•	infile is the input file

•	outfile is the output file

The flags -f and -v can be in either order.  So, the following would all be legal:

•	java Huffman -c test test.huff

•	java Huffman -c -v myTestFile myCompressedFile

•	java Huffman -c -f -v test test.huff

•	java Huffman -u -f test1.huff test2

•	java Huffman -u -f -v test1.huff test2

If a file is compressed with the "-v" option, you should print the following to standard output (using System.out.print(ln)):

•	The frequency of each character in the input file (print the ASCII values of the characters, instead of the characters themselves, to make this more readable for binary files)

•	The Huffman tree (see class notes on printing trees for pointers on how this can be done)

•	The Huffman codes for each character that has a code (characters which do not appear in the input file will not have codes.  Again, print the ASCII values of characters instead of the characters themselves)

•	The size of the uncompressed file and the size of the compressed file

If a file is uncompressed with the "-v" option, you should print out following to standard output (using System.out.print(ln)):

•	The Huffman tree (see class notes on printing trees for pointers on how this can be done)