Possible optimisation for SB/LB/LAB/SAB/DAB

Question

Possible optimisation for SB/LB/LAB/SAB/DAB

ZornsLemma opened this issue 7 years ago · 4 comments

It occurred to me that there's potential for a small optimisation on the byte read/write opcode implementations. If we take SB as an example, it's currently:

SB      LDA     ESTKL,X
        STA     TMPL
        LDA     ESTKH,X
        STA     TMPH
        LDA     ESTKL+1,X
        STY     IPY
        LDY     #$00
        STA     (TMP),Y
        LDY     IPY
        INX
;       INX
;       JMP     NEXTOP
        JMP     DROP

If we're willing to use self-modifying code, we can write that as:

SB      LDA     ESTKL,X
        STA     SBSTA+1
        LDA     ESTKH,X
        STA     SBSTA+2
        LDA     ESTKL+1,X
SBSTA   STA     $0000
        INX
;       INX
;       JMP     NEXTOP
        JMP     DROP

which by my count saves 3 bytes and 8 cycles.

If we're not willing to use self-modifying code, but we are willing to set aside two consecutive zp locations ZEROL and ZEROH and initialise ZEROL permanently to 0 on VM startup, we can write that as:

SB      STY     IPY
        LDY     ESTKL,X
        LDA     ESTKH,X
        STA     ZEROH
        LDA     ESTKL+1,X
        STA     (ZEROL),Y
        LDY     IPY
        INX
;       INX
;       JMP     NEXTOP
        JMP     DROP

which by my count saves 4 bytes and 5 cycles compared to the original code (although we'd lose 4 bytes to the one-off initialisation of ZEROL in VMINIT, but we'd still come out ahead applying this optimisation across multiple byte read/write opcodes).

(Disclaimer - I've given the self-modifying version a quick test and it seems to be fine - ROGUE runs :-) - but I haven't tested the other one at all.)

PS The non-self-modifying approach using ZEROL/H might also allow optimisation of the word-oriented versions of these opcodes.

Answer 1 · 2017-08-10T04:05:49.000Z

Very interesting. I’ll check the Apple II for write-protected memory banks.

…

On Aug 9, 2017, at 2:59 PM, ZornsLemma ***@***.***> wrote: It occurred to me that there's potential for a small optimisation on the byte read/write opcode implementations. If we take SB as an example, it's currently: SB LDA ESTKL,X STA TMPL LDA ESTKH,X STA TMPH LDA ESTKL+1,X STY IPY LDY #$00 STA (TMP),Y LDY IPY INX ; INX ; JMP NEXTOP JMP DROP If we're willing to use self-modifying code, we can write that as: SB LDA ESTKL,X STA SBSTA+1 LDA ESTKH,X STA SBSTA+2 LDA ESTKL+1,X SBSTA STA $0000 INX ; INX ; JMP NEXTOP JMP DROP which by my count saves 3 bytes and 8 cycles. If we're not willing to use self-modifying code, but we are willing to set aside two consecutive zp locations ZEROL and ZEROH and initialise ZEROL permanently to 0 on VM startup, we can write that as: SB STY IPY LDY ESTKL,X LDA ESTKH,X STA ZEROH LDA ESTKL+1,X STA (ZEROL),Y LDY IPY INX ; INX ; JMP NEXTOP JMP DROP which by my count saves 4 bytes and 5 cycles compared to the original code (although we'd lose 4 bytes to the one-off initialisation of ZEROL in VMINIT, but we'd still come out ahead applying this optimisation across multiple byte read/write opcodes). (Disclaimer - I've given the self-modifying version a quick test and it seems to be fine - ROGUE runs :-) - but I haven't tested the other one at all.) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#21>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AELjJpKTqlfYFuWlvEwSVrD4WiAvXNv-ks5sWiu6gaJpZM4Oysgo>.

Answer 2 · 2017-08-10T04:10:00.000Z

Very interesting. I’ll check the Apple II for write-protected memory banks.

We can't guarantee to be running from writable memory, which is why I haven't suggested self-modification so far. It is certainly legal to write-protect the LC, for example. If we can spare two zero-page locations, then that's a good saving, but it might have to be an option in that case. There will be environments which don't have them available.

Answer 3 · 2017-08-10T04:31:44.000Z

We only need one ZP location if we overload TMPL and stick a zero before it.

…

On Aug 9, 2017, at 9:10 PM, Peter Ferrie ***@***.***> wrote: > Very interesting. I’ll check the Apple II for write-protected memory banks. We can't guarantee to be running from writable memory, which is why I haven't suggested self-modification so far. It is certainly legal to write-protect the LC, for example. If we can spare two zero-page locations, then that's a good saving, but it might have to be an option in that case. There will be environments which don't have them available. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#21 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AELjJue2-OfqM7Lu63WN9l5N4a6KqKvBks5sWoKYgaJpZM4Oysgo>.

Answer 4 · 2017-08-12T22:58:55.000Z

So I took a stab at implementing the self-modifying code. In order to keep the Apple II's language card bank write-enabled required enough extra code that it may be a wash. So it is off by default for the Apple II. For the Apple I and Apple III, there is no such requirement, so it is enabled by default. Steve's other idea for using a zero in the ZP didn't offer up as much opportunity as I thought it might, so I didn't implement it. But there is a spot in the ZP variables to squeeze a zero in front of TMPL if we want to re-visit this.