binji/pokegb

Crashes when going to the title screen

nadiaholmquist opened this issue ยท 13 comments

ROM MD5: 50927e843568814f7ed45ec4f944bd8b
crash info from macOS

binji commented

I never tested it on macOS, my guess is that some of the constants that I used for open/mmap/SDL_* are different on macOS. Does it run at all, or just crash right away?

The gamefreak logo and pokemon fight intro plays, then when it fades to white and is about to show the title screen, it crashes.

Skaermoptagelse.2021-05-28.kl.21.11.53.mov
binji commented

ah, I know what this is. I mmap the sav file and use ftruncate to make it 32k. This seems to work on Linux, but it looks like it fails on macOS. I wonder if there's a better way to do this... in any case, you should be able to fix it by running truncate rom.sav -s 32768 I think. (not sure if macOS has truncate, actually)

Creating a 32KB save file using dd did the trick. Looks like macOS doesn't have truncate or fallocate, so idk if there's a better way to create it - I'm much more familiar with the GNU utils than the Mac ones too.

Edit: ftruncate on macOS is weird.

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>

int main(int argc, char** argv) {
	int test = open("test.bin", O_CREAT | O_RDWR, 0644);
	if (ftruncate(test, 32768) != 0) {
		perror("test"); //doesn't happen, even when  it seems to fail
	}

	close(test);
	return 0;
}
nhp@narshe Diverse % touch test.bin
nhp@narshe Diverse % ./ftruncate-test
nhp@narshe Diverse % du -h test.bin
  0B	test.bin
nhp@narshe Diverse % dd if=/dev/zero of=test.bin bs=1 count=65536
65536+0 records in
65536+0 records out
65536 bytes transferred in 0.172081 secs (380844 bytes/sec)
nhp@narshe Diverse % ./ftruncate-test
nhp@narshe Diverse % du -h test.bin
 32K	test.bin

Huh

It's supposed to do that.

On POSIX filesystems, there's a difference between how much data a file contains versus how much space it consumes. For example, on my Linux system, the I/O block size is 4,096 bytes, so even a 1-byte file will consume 4KB:

$ printf a > test.bin
$ hexdump -C test.bin
00000000  61                                                |a|
00000001
$ stat -c "bytes: %s blocks: %b block size: %B" test.bin
bytes: 1 blocks: 8 block size: 512

As you can see, the file is exactly 1 byte long, containing only a, but it takes up 8 * 512 = 4,096 bytes on disk.

The same thing works in the opposite direction: if you ftruncate() a file to extend it to a greater size, the filesystem doesn't actually allocate all that space until something tries to write to it - so the file can be much larger than its allocated size:

$ truncate -s 32K test.bin
$ hexdump -C test.bin 
00000000  61 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |a...............|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00008000
$ stat -c "bytes: %s blocks: %b block size: %B" test.bin 
bytes: 32768 blocks: 8 block size: 512

The file is now 32,768 bytes long, containing an a followed by 0x7FFF NUL bytes, but it still takes up 8 * 512 = 4,096 bytes on disk. And if I hadn't put an a in there to start with, it would take up 0 bytes on disk.

binji commented

@Screwtapello thanks for the explanation! I wonder if there's a short way to get the behavior I want -- create or open a file of a fixed size on both macOS and Linux. Any ideas?

I had a brief search for ftruncate-related issues on macOS, and I think (based on the age of Stack Overflow answers) that the HFS+ filesystem used by older versions didn't support sparse (size larger than storage) files. I expect that's fixed with modern APFS disks, and regardless, that should be the portable way to make a file of a fixed size.

A segfault is a pretty typical result of writing beyond the mmap'd portion of a file. Since ftruncate() is called before mmap(), and since they both specify a size of 32768, I expect the file would be guaranteed to be the right size. If it's not, perhaps an fsync() between the two would help?

Things to check:

  • when the emulator crashes on macOS, how big is the save RAM file? Both "size in bytes" and "blocks allocated"
  • if you run the emulator under strace (Linux) or truss (macOS), do the ftruncate() or mmap() calls return an error?
binji commented

Thanks! I don't have a mac to test on so it's a bit tough for me. At least on my linux laptop it looks reasonable:

openat(AT_FDCWD, "rom.gb", O_RDONLY)    = 3
mmap(NULL, 1048576, PROT_READ, MAP_SHARED, 3, 0) = 0x7f1eb9e0d000
openat(AT_FDCWD, "rom.sav", O_RDWR|O_CREAT, 0666) = 4
ftruncate(4, 32768)                     = 0
mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_SHARED, 4, 0) = 0x7f1eba3fe000

For anybody looking for an easy copy-paste workaround:

dd if=/dev/zero of=rom.sav bs=1 count=32k

Ubuntu18.04 coredump

Since this crash has been observed on Linux, I thought I'd try it myself, but I can't reproduce it (Debian Testing, Linux 5.10 on XFS). Here's a shell-command to run from the directory containing pokegb and rom.gb:

rm -f rom.sav && strace -e open,openat,mmap,ftruncate ./pokegb 2>&1 | grep -A2 rom.sav
Explain this command...
  • remove the old rom.sav file, so we can test the fresh cretion process
  • trace the open, openat, mmap and ftruncate system calls during the execution of ./pokegb, and print their results to stderr
  • pipe stderr to stdout so we can filter it
  • show every line mentioning "rom.sav", and the 2 lines After it.

If I run the above command on my system, and close the pokegb window once the title-screen appears and "Blue Version" scrolls into place, I get the following output:

openat(AT_FDCWD, "rom.sav", O_RDWR|O_CREAT, 0666) = 4
ftruncate(4, 32768)                     = 0
mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_SHARED, 4, 0) = 0x7f8de8472000
Explain this output...
  • pokegb asks the kernel to open the file "rom.sav" in the current direcotry, creating it if it doesn't exist
  • the kernel responds "sure, it's file-descriptor 4"
  • pokegb asks the kernel to "truncate" file 4 to 32768 bytes long
  • the kernel responds "done!"
  • pokegb asks the kernel to map the first 32768 bytes of file 4 into memory somewhere
  • the kernel responds "sure, the file data is available at memory address 0x7f8de8472000"

It's likely that results of openat() and mmap() will vary each time the program is run, but they should be some sensible value. As a counter-example, I removed read permission from rom.sav (chmod a-rw rom.sav) and now the above command produces different output for me:

openat(AT_FDCWD, "rom.sav", O_RDWR|O_CREAT, 0666) = -1 EACCES (Permission denied)
ftruncate(-1, 32768)                    = -1 EBADF (Bad file descriptor)
mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_SHARED, -1, 0) = -1 EBADF (Bad file descriptor)

As expected, the openat() returns a permission error instead of a valid file descriptor, causing the following operations to fail with "Bad file descriptor" errors.

If somebody who can reproduce this crash on Linux can run that command and paste the output here, we might be able to figure out what's going on.

binji commented

This would be nice to fix, but in the meantime I added a simple workaround. You can run make rom.sav to copy from an empty save file.

interesting