pinobatch/240p-test-mini

GB: Compress tiles with nibble Huffman

Closed this issue · 2 comments

I'm to the point where redundancy within a byte is the only redundancy I can see in the Game Boy suite's assets. I plan to convert the CHR data using Huffman coding a nibble (4 bits) at a time. However, this is slow (22 cycles per input bit, maybe 74 per output byte).

The next steps:

  • See if explicit tree Huffman is any faster to decode than the canonical Huffman we're using
  • Design bitstream header containing Huffman tree and length
  • Compress font
  • Design a file system for all assets
  • Decompress all CHR in advance
  • Shadow sprite: Draw frame type using sprites
  • Shadow sprite: Keep GHZ and Gus loaded in VRAM
  • Compress CHR outside IU files
  • Compress CHR inside IU files

Current memory layout

  • $8000-$87BF: Sprites in various shadow states
  • $8D80-$8DFF: All 5 frame type names
  • $9000-$97FF: Gus
  • $9000-$966F: Hill zone
  • $9000-$900F: Striped tile
  • $9800-$9A3F: All tilemaps
  • $9C00-$9C07: Frame type tilemap

Desired memory layout

  • $8000-$866F: Hill zone
  • $8780-$87EF: Name of (only) the current frame type
  • $8800-$8FBF: Sprites in various shadow states
  • $8FF0-$8FFF: Striped tile
  • $9000-$97FF: Gus
  • $9800-$9A3F: Tilemap (Hill zone)
  • $9C00-$9C9F: Tilemap (Striped)
  • $9DC0-$9FFF: Tilemap (Gus)
  • Frame type drawn using sprites instead

Huffman takes a lot longer to decompress than the codecs I had been using. Decompressing the font alone (two blocks totaling 936 bytes) takes 0x5A2FA = 369402 clocks of BGB's time base of 2.097 MHz (one unit per two dots), or 176 ms. This implies a rate of about 200 mcycles per byte or 5.3 kB/s. This can make it painful for the user to switch among test patterns using a lot of compressed data. And based on my calculations, explicit tree Huffman isn't appreciably faster than canonical (codes-per-length based) Huffman.

One way around it is to do what Super Mario World does: decompress most assets to otherwise unused WRAM while the game sits on the copyright page. Then compression remains as fast as it is now.

We also want to compress only the first part of each IU file (the CHR ROM). So the tools will need to make read all affected .chr.pb16 and .iu files and separate them into CHR (to be compressed) and the remainder (to include literally). So we'll probably need to bring all non-menu assets that the Huffman coder touches into one Python program rather than relying on Make and its problems with parallelizing jobs that have multiple output files.