Humpheh/goboy

Speed up CPU

ear7h opened this issue · 5 comments

ear7h commented

Just found this project and saw one of you goals was to speed up the CPU, I think I found an optimization that can be made in the following function. Currently there is a switch statement but I think a lookup table in the form of a map[byte]func(*Gameboy) or [numOpCodes]func(*Gameboy) (static size array) would be more efficient.

// ExecuteOpcode is a large switch statement containing the opcode operations.
func (gb *Gameboy) ExecuteOpcode(opcode byte) {

ear7h commented

Quick update:
Currently working on a PR for this issue, I have the following references which shed some light on the subject. Also it looks like the handling of the CB instruction can benefit slightly by using static arrays.

Array access optimization
https://stackoverflow.com/a/43942734

Golang issue tracking jump tables
golang/go#5496

Hey @ear7h, this is really interesting. I've thought about using this method before but never got around to benchmarking it.

It would be great if we could compare the three methods of doing it. From recent experience with optimising the PPU I'm pretty sure that [numOpCodes]func(*Gameboy) would end up being quicker than map[byte]func(*Gameboy) as I found the mapaccess1 to be a large overhead (probably due to the hashing), and as the opcodes are all integers up to 0xFF I can't see a reason not to use the array.

I think it would be good to find some of the places where slices are used instead of arrays - I imagine there are a few cases where I have done this.

ear7h commented

My PR has the array table, I proposed a map becasue I wasn't sure how dense the op codes were (never worked with emulators before). The PR doesn't have any benchmarks, is there any resources you know to make this easier? I tried finding some sort of benchmarking ROM but a cursory search didn't yield anything.

Thats fair, thanks for contributing! There is a builtin benchmarking library into go which is good to use. I added a benchmark to #21 which just randomly calls opcodes on a blank rom. I feel like that gets a good example of the switch/array performance even if it doesn't fully exercise the opcode functionality.

Closing as this was implemented in #21.