GB: Compress tiles with nibble Huffman

Question

GB: Compress tiles with nibble Huffman

Closed this issue 5 years ago · 2 comments

I'm to the point where redundancy within a byte is the only redundancy I can see in the Game Boy suite's assets. I plan to convert the CHR data using Huffman coding a nibble (4 bits) at a time. However, this is slow (22 cycles per input bit, maybe 74 per output byte).

The next steps:

See if explicit tree Huffman is any faster to decode than the canonical Huffman we're using
Design bitstream header containing Huffman tree and length
Compress font
Design a file system for all assets
Decompress all CHR in advance
Shadow sprite: Draw frame type using sprites
Shadow sprite: Keep GHZ and Gus loaded in VRAM
Compress CHR outside IU files
Compress CHR inside IU files

Answer 1 · 2019-08-31T14:39:29.000Z

Current memory layout

$8000-$87BF: Sprites in various shadow states
$8D80-$8DFF: All 5 frame type names
$9000-$97FF: Gus
$9000-$966F: Hill zone
$9000-$900F: Striped tile
$9800-$9A3F: All tilemaps
$9C00-$9C07: Frame type tilemap

Desired memory layout

$8000-$866F: Hill zone
$8780-$87EF: Name of (only) the current frame type
$8800-$8FBF: Sprites in various shadow states
$8FF0-$8FFF: Striped tile
$9000-$97FF: Gus
$9800-$9A3F: Tilemap (Hill zone)
$9C00-$9C9F: Tilemap (Striped)
$9DC0-$9FFF: Tilemap (Gus)
Frame type drawn using sprites instead

Answer 2 · 2019-09-09T00:24:16.000Z

Huffman takes a lot longer to decompress than the codecs I had been using. Decompressing the font alone (two blocks totaling 936 bytes) takes 0x5A2FA = 369402 clocks of BGB's time base of 2.097 MHz (one unit per two dots), or 176 ms. This implies a rate of about 200 mcycles per byte or 5.3 kB/s. This can make it painful for the user to switch among test patterns using a lot of compressed data. And based on my calculations, explicit tree Huffman isn't appreciably faster than canonical (codes-per-length based) Huffman.

One way around it is to do what Super Mario World does: decompress most assets to otherwise unused WRAM while the game sits on the copyright page. Then compression remains as fast as it is now.

We also want to compress only the first part of each IU file (the CHR ROM). So the tools will need to make read all affected .chr.pb16 and .iu files and separate them into CHR (to be compressed) and the remainder (to include literally). So we'll probably need to bring all non-menu assets that the Huffman coder touches into one Python program rather than relying on Make and its problems with parallelizing jobs that have multiple output files.