DESCRIPTION ----------- This archive contains a simple and readable ANSI C implementation of Huffman coding and decoding. This implementation is not intended to be the best, fastest, smallest, or any other performance related adjective. More information on Huffman encoding may be found at: http://michael.dipperstein.com/huffman http://datacompression.info/Huffman.shtml FILES ----- bitarray.c - Library providing logical operations on arbitrary length arrays of bits. bitarray.h - Header for bit array library. bitfile.c - Library to allow bitwise reading and writing of files. bitfile.h - Header for bitfile library. canonical.c - Huffman encoding and decoding routines using canonical codes COPYING - Rules for copying and distributing GPL software COPYING.LESSER - Rules for copying and distributing LGPL software huffman.c - Huffman encoding and decoding routines huffman.h - Header file used by code calling library functions huflocal.h - Header file with internal library definitions common to both canonical and conventional techniques huflocal.c - File with internal library functions common to both canonical and conventional techniques Makefile - makefile for this project (assumes gcc compiler and GNU make) optlist.c - Source code for GetOptlist function and supporting functions optlist.h - Header file to be included by code using the optlist library README - this file sample.c - Demonstration of how to use Huffman library functions BUILDING -------- To build these files with GNU make and gcc, simply enter "make" from the command line. USAGE ----- Usage: sample <options> options: -C : Encode/Decode using a canonical code. -c : Encode input file to output file. -d : Decode input file to output file. -t : Generate code tree for input file to output file. -i <filename> : Name of input file. -o <filename> : Name of output file. -h|? : Print out command line options. -C Uses a canonical Huffman code (canonical.c). -c Generates a Huffman tree for the specified input file (see -i) and compresses a file using the tree to the specified output file (see -o). -d Decompresses the specified input file (see -i) writing the results to the specified output file (see -o). Only files compressed by this program may be decompressed. -t Generates a Huffman tree for the specified input file (see -i) and writes the resulting code to the specified output file (see -o). -i <filename> The name of the input file. There is no valid usage of this program without a specified input file. -o <filename> The name of the output file. If no file is specified, stdout will be used. NOTE: Sending compressed output to stdout may produce undesirable results. LIBRARY API ----------- Encoding Data (Traditional or Canonical codes): int HuffmanEncodeFile(FILE *inFile, FILE *outFile); int CanonicalEncodeFile(FILE *inFile, FILE *outFile); inFile The file stream to be encoded. It must be rewindable and opened. NULL pointers will return an error. outFile The file stream receiving the encoded results. It must be opened. NULL pointers will return an error. Return Value Zero for success, -1 for failure. Error type is contained in errno. Files will remain open. Decoding Data (Traditional or Canonical codes): int HuffmanDecodeFile(FILE *inFile, FILE *outFile); int CanonicalDecodeFile(FILE *inFile, FILE *outFile); inFile The file stream to be decoded. It must be opened. NULL pointers will return an error. outFile The file stream receiving the decoded results. It must be opened. NULL pointers will return an error. Return Value Zero for success, -1 for failure. Error type is contained in errno. Files will remain open. Displaying a Tree Generated by Algorithm (Traditional or Canonical codes): int HuffmanShowTree(FILE *inFile, FILE *outFile); int CanonicalShowTree(FILE *inFile, FILE *outFile); inFile The file stream that a tree will be generated for. It must be opened. NULL pointers will return an error. outFile The file stream that the tree will be sent to. It must be opened. NULL pointers will return an error. It is perfectly acceptable to use stdout. Return Value Zero for success, -1 for failure. Error type is contained in errno. Files will remain open. HISTORY ------- 10/23/03 - Corrected errors which occurred when encoding and decoding files containing a single symbol. - Modified canonical version to dynamically allocate the array used to store the canonical list of codes. - Use $(OS) environment variable to determine operating system in Makefile. 11/20/03 - Correctly handle symbol codes that are over 16 bits long. These changes can handle codes up to 255 bits long. 01/05/04 - Encode EOF along with other symbols 01/12/04 - Use bit stream file library 02/02/04 - Use bit array library 02/25/04 - Make huffman.c and chuffman.c more library like by removing main and adding a header file with prototypes for encode/decode functions. - Add sample usage of functions. 06/14/04 - Use latest version of bitfile.c. - Make routines local to huffman.c and chuffman.c static so that they may be called from the same program. - Include use of canonical and traditional Huffman codes in sample program. 05/22/05 - Provides single declaration for items common canonical and traditional encoding source - Makefile builds libraries for easier LGPL compliance. 06/21/05 - Corrected BitFileGetBits/PutBits error that accessed an extra byte when given an integral number of bytes. 08/29/07 - Explicitly licensed under LGPL version 3. - Replaces getopt() with optlist library. 06/08/08 - Incorporates Emanuele Giaquinta's patch to eliminate redundant check during the canonical decode process. 08/24/14 - Restructured code for cleaner Traditional/Canonical separation 11/18/14 - Changed the API so that encode and decode routines accept opened file streams instead of file names. - Changed return value to 0 for success and -1 for failure with reason in errno. - Upgraded to latest oplist and bitfile libraries. - Tighter adherence to Michael Barr's "Top 10 Bug-Killing Coding Standard Rules" (http://www.barrgroup.com/webinars/10rules). TODO ---- - Provide a command line option to change symbol size. - Implement bijective scheme for ending file on byte boundary without counting the number of encoded symbols encoded (http://bijective.dogma.net/). AUTHOR ------ Michael Dipperstein (mdipper@alumni.engr.ucsb.edu)