Possible optimisation for SB/LB/LAB/SAB/DAB
ZornsLemma opened this issue · 4 comments
It occurred to me that there's potential for a small optimisation on the byte read/write opcode implementations. If we take SB as an example, it's currently:
SB LDA ESTKL,X
STA TMPL
LDA ESTKH,X
STA TMPH
LDA ESTKL+1,X
STY IPY
LDY #$00
STA (TMP),Y
LDY IPY
INX
; INX
; JMP NEXTOP
JMP DROP
If we're willing to use self-modifying code, we can write that as:
SB LDA ESTKL,X
STA SBSTA+1
LDA ESTKH,X
STA SBSTA+2
LDA ESTKL+1,X
SBSTA STA $0000
INX
; INX
; JMP NEXTOP
JMP DROP
which by my count saves 3 bytes and 8 cycles.
If we're not willing to use self-modifying code, but we are willing to set aside two consecutive zp locations ZEROL and ZEROH and initialise ZEROL permanently to 0 on VM startup, we can write that as:
SB STY IPY
LDY ESTKL,X
LDA ESTKH,X
STA ZEROH
LDA ESTKL+1,X
STA (ZEROL),Y
LDY IPY
INX
; INX
; JMP NEXTOP
JMP DROP
which by my count saves 4 bytes and 5 cycles compared to the original code (although we'd lose 4 bytes to the one-off initialisation of ZEROL in VMINIT, but we'd still come out ahead applying this optimisation across multiple byte read/write opcodes).
(Disclaimer - I've given the self-modifying version a quick test and it seems to be fine - ROGUE runs :-) - but I haven't tested the other one at all.)
PS The non-self-modifying approach using ZEROL/H might also allow optimisation of the word-oriented versions of these opcodes.
So I took a stab at implementing the self-modifying code. In order to keep the Apple II's language card bank write-enabled required enough extra code that it may be a wash. So it is off by default for the Apple II. For the Apple I and Apple III, there is no such requirement, so it is enabled by default. Steve's other idea for using a zero in the ZP didn't offer up as much opportunity as I thought it might, so I didn't implement it. But there is a spot in the ZP variables to squeeze a zero in front of TMPL if we want to re-visit this.