dschmenk/PLASMA

Possible optimisation for SB/LB/LAB/SAB/DAB

ZornsLemma opened this issue · 4 comments

It occurred to me that there's potential for a small optimisation on the byte read/write opcode implementations. If we take SB as an example, it's currently:

SB      LDA     ESTKL,X
        STA     TMPL
        LDA     ESTKH,X
        STA     TMPH
        LDA     ESTKL+1,X
        STY     IPY
        LDY     #$00
        STA     (TMP),Y
        LDY     IPY
        INX
;       INX
;       JMP     NEXTOP
        JMP     DROP

If we're willing to use self-modifying code, we can write that as:

SB      LDA     ESTKL,X
        STA     SBSTA+1
        LDA     ESTKH,X
        STA     SBSTA+2
        LDA     ESTKL+1,X
SBSTA   STA     $0000
        INX
;       INX
;       JMP     NEXTOP
        JMP     DROP

which by my count saves 3 bytes and 8 cycles.

If we're not willing to use self-modifying code, but we are willing to set aside two consecutive zp locations ZEROL and ZEROH and initialise ZEROL permanently to 0 on VM startup, we can write that as:

SB      STY     IPY
        LDY     ESTKL,X
        LDA     ESTKH,X
        STA     ZEROH
        LDA     ESTKL+1,X
        STA     (ZEROL),Y
        LDY     IPY
        INX
;       INX
;       JMP     NEXTOP
        JMP     DROP

which by my count saves 4 bytes and 5 cycles compared to the original code (although we'd lose 4 bytes to the one-off initialisation of ZEROL in VMINIT, but we'd still come out ahead applying this optimisation across multiple byte read/write opcodes).

(Disclaimer - I've given the self-modifying version a quick test and it seems to be fine - ROGUE runs :-) - but I haven't tested the other one at all.)

PS The non-self-modifying approach using ZEROL/H might also allow optimisation of the word-oriented versions of these opcodes.

So I took a stab at implementing the self-modifying code. In order to keep the Apple II's language card bank write-enabled required enough extra code that it may be a wash. So it is off by default for the Apple II. For the Apple I and Apple III, there is no such requirement, so it is enabled by default. Steve's other idea for using a zero in the ZP didn't offer up as much opportunity as I thought it might, so I didn't implement it. But there is a spot in the ZP variables to squeeze a zero in front of TMPL if we want to re-visit this.