gbdev/rgbds

RGBGFX segfault

Closed this issue · 6 comments

Discord report: https://discord.com/channels/303217943234215948/870005582042628196/1336507100620132382

segfault.png

The gist of it seems to be that embedded palettes are not read properly, and an image with an embedded palette that uses color #0 on any pixel causes the code that displays a palette to segfault. Here's the output for rgbgfx -o segfault.2bpp -c embedded segfault.png:

error: Failed to fit tile colors [$1527, $15cc, $1ab4] in specified palettes
note: The following palette was specified:
Segmentation fault (core dumped)

If I modify the image to not use color #0 from the embedded palette, I get a similar failure but with no segfault, so the code successfully prints... no palette:

error: Failed to fit tile colors [$1527, $15cc, $1ab4] in specified palettes
note: The following palette was specified:
        []
Conversion aborted after 1 error

Note that GItHub re-encodes the image, so:

% xxd segfault.png 
00000000: 8950 4e47 0d0a 1a0a 0000 000d 4948 4452  .PNG........IHDR
00000010: 0000 0060 0000 0008 0803 0000 0031 e10a  ...`.........1..
00000020: 3d00 0000 0173 5247 4200 aece 1ce9 0000  =....sRGB.......
00000030: 000c 504c 5445 6ce0 40a0 a830 6070 2838  ..PLTEl.@..0`p(8
00000040: 4828 279c d804 0000 0004 7452 4e53 00ff  H('.......tRNS..
00000050: ffff b32d 4088 0000 008f 4944 4154 2891  ...-@.....IDAT(.
00000060: b552 5b12 c420 0833 e4fe 77de 8108 52b5  .R[.. .3..w...R.
00000070: d3af 85b6 60a2 3c2a 63b8 189f 6237 70e3  ....`.<*c...b7p.
00000080: dfe8 38ab f3e5 0e02 6028 4002 0c1e 5a27  ..8.....`(@...Z'
00000090: b6f1 0b6f d6b5 8a93 3b93 f91e 7dfc 5100  ...o....;...}.Q.
000000a0: 0a93 b2d9 e44f 3c13 28ce ea40 0066 82ac  .....O<.(..@.f..
000000b0: 905e 25ab 0bbd 8d9f c574 eb4e c47b 3690  .^%......t.N.{6.
000000c0: 19e7 4ee2 e8a0 7b8b dff5 ab03 5695 5677  ..N...{.....V.Vw
000000d0: 907a f26d 9dff 7fbc dec1 3612 761d adcf  .z.m......6.v...
000000e0: 21b3 c0ae 53f4 47f9 0125 ba04 bca2 595c  !...S.G..%....Y\
000000f0: eb00 0000 0e65 5849 664d 4d00 2a00 0000  .....eXIfMM.*...
00000100: 0800 0000 0000 0000 d253 9300 0000 0049  .........S.....I
00000110: 454e 44ae 4260 82                        END.B`.

Note that this bug does not occur with the Rust rewrite:

$ target/debug/rgbgfx -c embedded -o segfault.2bpp segfault.png 
error: Some tiles cannot be displayed with the specified palettes

  │ Note: The following palettes were specified:
  │         [transparent, $1ab4, $15cc]
  │       No palette contains colors [$1527, $15cc, $1ab4]
──╯

I'm not sure if the "Failed to fit tile colors"/"Some tiles cannot be displayed" error is appropriate though. The embedded palette specifies four colors, with transparency as a fifth:

% pngcheck -p segfault.png
File: segfault.png (279 bytes)
  PLTE chunk: 4 palette entries
    0:  (108,224, 64) = (0x6c,0xe0,0x40)
    1:  (160,168, 48) = (0xa0,0xa8,0x30)
    2:  ( 96,112, 40) = (0x60,0x70,0x28)
    3:  ( 56, 72, 40) = (0x38,0x48,0x28)
  tRNS chunk: 4 transparency entries
    0:    0 = 0x00
    1:  255 = 0xff
    2:  255 = 0xff
    3:  255 = 0xff
OK: segfault.png (96x8, 8-bit palette+trns, non-interlaced, 63.7%).

(Note that those four RGB888 palette entries are equivalent to GBC [$238d, $1ab4, $15cc, $1527].)

The actual image clearly uses four colors: three from the palette (indexes 1, 2, and 3) plus transparent. So the question is, should "transparent" be counted as the unused color at index 0, or should it fail because -c embedded says "use this exact palette" and transparent is not the same color as (108,224, 64)?

This also makes me question the Rust output: why is it claiming that the specified palette is [transparent, $1ab4, $15cc] instead of [$238d, $1ab4, $15cc, $1527] (the embedded PLTE) or [transparent, $1ab4, $15cc, $1527] (the four actually-used colors)?

Here's the abbreviated make develop ASan output:

$ ./rgbgfx -c embedded -o segfault.2bpp segfault.png
=================================================================
==22676==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000298 at pc 0x00010d93b93a bp 0x7ffee2333790 sp 0x7ffee2333788
READ of size 2 at 0x602000000298 thread T0
    #0 0x10d93b939 in _ std::find<_>(_) algorithm:929
    #1 0x10d93b6e5 in makePalsAsSpecified(_)::$_0::operator()(_) const::'lambda'(_)::operator()(_) const process.cpp:633
    #2 0x10d93b481 in bool std::all_of<_>(_) algorithm:855
    #3 0x10d93b302 in makePalsAsSpecified(_)::$_0::operator()(_) const process.cpp:632
    #4 0x10d935bd1 in _ std::find_if<_>(_) algorithm:943
    #5 0x10d92c4cd in makePalsAsSpecified(_) process.cpp:630
    #6 0x10d92dd0d in process() process.cpp:1233
    #7 0x10d8a59e8 in main main.cpp:881
    #8 0x7fff208e2f3c in start+0x0 (libdyld.dylib:x86_64+0x15f3c)

0x602000000298 is located 0 bytes to the right of 8-byte region [0x602000000290,0x602000000298)
allocated by thread T0 here:
    ...
    #8 0x10d92c2a6 in makePalsAsSpecified(_) process.cpp:605
    #9 0x10d92dd0d in process() process.cpp:1233
    #10 0x10d8a59e8 in main main.cpp:881
    #11 0x7fff208e2f3c in start+0x0 (libdyld.dylib:x86_64+0x15f3c)

This change to makePalsAsSpecified in process.cpp is sufficient to match the Rust output:

-               for (size_t i = 0; i < options.nbColorsPerPal && (!spec[i] || spec[i]->isOpaque()); ++i) {
+               for (size_t i = 0; i < options.nbColorsPerPal; ++i) {

(Although the specified palette does not list transparent as its first color, since rgbgfx listColors never supported that anyway.)

This change also breaks the at-file-ref.png test case, because color $8000 (transparent) does get listed in its specified palettes:

Testing: ../../rgbgfx @at-file-ref.flags at-file-ref.png
--- at-file-ref.err	2025-02-15 06:34:54.000000000 -0500
+++ /var/folders/yt/6lmc7y552t38tg31nf5jq_y40000gp/T/tmp.opb8HreA	2025-04-23 01:19:04.000000000 -0400
@@ -2,7 +2,7 @@
 note: The following palettes were specified:
         [$5f77, $213d, $41a6, $40ee]
         [$3f65, $36b3, $262a, $50b0]
-        [$53c3, $3f65, $36b3]
-        [$5f77, $267c, $41a6]
-        [$267c, $213d, $40ee]
+        [$53c3, $3f65, $36b3, $8000]
+        [$5f77, $267c, $41a6, $8000]
+        [$267c, $213d, $40ee, $8000]
 Conversion aborted after 1 error

But, the Rust port also has a problem with this test case:

error: The GBC palette dump does not contain an integer amount of palettes (0 plus 8 bytes)
   ╭─[ at-file-ref.pal:1:1 ]
   │
 1 │ 
───╯

This change to try_fill_buf in pal_spec.rs fixes it:

-    while !nb_read != buf.len() {
+    while nb_read < buf.len() {

With that change, this is the output:

error: Some tiles cannot be displayed with the specified palettes
  │ 
  │ Note: The following palettes were specified:
  │         [$5f77, $213d, $41a6, $40ee]
  │         [$3f65, $36b3, $262a, $50b0]
  │         [$53c3, $3f65, $36b3, $ffff]
  │         [$5f77, $267c, $41a6, $ffff]
  │         [$267c, $213d, $40ee, $ffff]
  │       No palette contains colors [$6c8a, $7f55, $7fff]
──╯

It shows $ffff instead of $8000 for transparent, but is otherwise the same as the "broken" at-file-ref test case output. So I think it's fine, and this is an appropriate fix for the bug.

Thank you for figuring this out! Feel free to submit a PR against rsgbds to fix that bug. <3

This change is sufficient to fix the reported segfault without even changing the at-file-ref output:

-               for (size_t i = 0; i < options.nbColorsPerPal && (!spec[i] || spec[i]->isOpaque()); ++i) {
+               for (size_t i = 0; i < options.nbColorsPerPal; ++i) {
                        // If the spec has a gap, there's no need to copy anything.
-                       if (spec[i]) {
+                       if (spec[i] && !spec[i]->isTransparent()) {

Frankly, I expect that even with this fix, there are other bugs lurking in the palette-generation code: maybe not segfaults, but correctness-according-to-spec bugs. That goes for both the C++ and Rust versions. Maybe at some point I'll rewrite RGBGFX, in one language or another; but it's at least nice to have "no known bugs".