Apple II floating bus does not switch modes at the right time
ryandesign opened this issue · 2 comments
I noticed this problem as I begin to write a program to automate the testing of emulator behavior with regard to the Apple II floating bus and vertical blanking.
On my real unenhanced Apple IIe if I am in text mode and the hires switch is on and the mixed switch is off and I switch to graphics mode while loading a byte from the floating bus:
sta $C051 ;text mode on
sta $C052 ;mixed mode off
sta $C057 ;hires mode on
ldx $C050 ;graphics mode on and load byte from floating bus
ldy $C050 ;load another byte from floating bus
then the first byte that got loaded into the X register is a byte from the text screen that was being displayed and the second byte that got loaded into the Y register is a byte from the hires screen that is now being displayed.
Virtual ][ 11.4 and OpenEmulator 1.1.1-202203110628 get it wrong: their floating bus switches to the new video mode too soon: both bytes are taken from the hires screen.
Clock Signal 23.10.29 (plus the fix for #1196 applied) gets it wrong the other way: its floating bus switches to the new video mode too late: both bytes are from the text screen. In fact I have to delay for thousands of cycles after switching video modes before Clock Signal's floating bus returns data from the new video mode.
Here is the source of my sample program that demonstrates the problem.
; SPDX-FileCopyrightText: © 2023 Ryan Carsten Schmidt <https://github.com/ryandesign>
; SPDX-License-Identifier: MIT
;save as instant.s and assemble and link with:
;cl65 -t apple2 -C apple2-asm.cfg --start-addr 0x1000 -u __EXEHDR__ -o instant instant.s
KBD = $C000 ;keyboard value
KBDSTRB = $C010 ;keyboard strobe
TXTCLR = $C050 ;graphics
TXTSET = $C051 ;text
MIXCLR = $C052 ;no split
HIRES = $C057 ;hires
PRBLNK = $F948 ;print 3 spaces
INIT = $FB2F ;set text mode, page 1, lores, standard text window
WAIT = $FCA8 ;delay (26+27*A+5*A*A)/2 cycles (A>0)
HOME = $FC58 ;clear text screen 1 and move cursor to top left
PRBYTE = $FDDA ;print A as hex
SETNORM = $FE84 ;set normal text
SETKBD = $FE89 ;set KSW to KEYIN
SETVID = $FE93 ;set CSW to COUT1
.proc main
jsr SETNORM ;normal text
jsr INIT ;text mode, page 1, lores, standard text window
jsr SETVID ;standard output
jsr SETKBD ;standard input
jsr HOME ;clear text screen
ldx #$0 ;init low byte counter
ldy #$14 ;init high byte counter
lda #$42 ;load byte to fill memory with
@loreshi: sty @loreslo+2 ;set high byte of address of sta below
@loreslo: sta $1400,x ;store the byte (address modified by sty above)
inx ;next low byte
bne @loreslo ;loop until done
iny ;next high byte
cpy #$18 ;compare against last high byte
bne @loreshi ;loop until done
ldy #$20 ;init high byte counter
lda #$7F ;load byte to fill memory with
@hireshi: sty @hireslo+2 ;set high byte of address of sta below
@hireslo: sta $2000,x ;store the byte (address modified by sty above)
inx ;next low byte
bne @hireslo ;loop until done
iny ;next high byte
cpy #$40 ;compare against last high byte
bne @hireshi ;loop until done
sta MIXCLR ;mixed mode off
sta HIRES ;hires mode on
@here: beq @load1st ;always
ldx TXTCLR ;graphics mode on and load floating bus byte
@delay: lda #80
jsr WAIT ;delay 17,093 cycles; exits with Z flag set
beq @load2nd ;always
@load1st: ldx TXTCLR ;graphics mode on and load floating bus byte
@load2nd: ldy TXTCLR ;load another floating bus byte
sta TXTSET ;text mode on
txa ;transfer 1st floating bus byte to A
jsr PRBYTE ;print byte
jsr PRBLNK ;print 3 spaces
tya ;transfer 2nd floating bus byte to A
jsr PRBYTE ;print byte
@waitkey: lda KBD ;load keypress
bpl @waitkey ;loop until keypress
sta KBDSTRB ;indicate keypress handled
bmi main ;always
.endproc
You can poke it into memory by entering the monitor with:
CALL -151
and then pasting this in:
1000:20 84 FE 20 2F FB 20 93 FE 20 89
:FE 20 58 FC A2 00 A0 14 A9 42 8C 1A
:10 9D 00 14 E8 D0 FA C8 C0 18 D0 F2
:A0 20 A9 7F 8C 2C 10 9D 00 20 E8 D0
:FA C8 C0 40 D0 F2 8D 52 C0 8D 57 C0
:F0 0A AE 50 C0 A9 50 20 A8 FC F0 03
:AE 50 C0 AC 50 C0 8D 51 C0 8A 20 DA
:FD 20 48 F9 98 20 DA FD AD 00 C0 10
:FB 8D 10 C0 30 9B
Run it with the monitor command:
1000G
It shows and clears the text screen, fills $1400-$1800 (the area scanned by the Apple II/II+ floating bus during horizontal blanking when page 1 of text or lores graphics are shown) with $42, fills $2000-$4000 (hires page 1) with $7F, then shows the hires screen by loading a byte from $C050 into X, loads another byte from $C050 into Y, then switches back to text mode, prints the hex values of X and Y, and waits for a keypress before doing it all again.
The real Apple IIe usually prints A0 7F
. A0 is a space with the high bit set (what the text screen is filled with) and 7F is what we filled the hires screen with. Instead of A0 it might print a value from the screen holes.
OpenEmulator and Virtual ][ print 7F 7F
.
Clock Signal emulating a IIe prints A0 A0
. Emulating a II/II+, we would often see 42 instead of A0. Instead of A0 or 42 we might see other bytes from the screen holes.
If you change the condition at @here:
from beq
($F0) to bne
($D0) with the monitor command:
103B:D0
then a substantial delay of 17,093 cycles (approximately the duration of one complete frame, which would be 17,030 cycles) is introduced after switching to graphics mode and before reading the second floating bus byte, which seems to be a long enough delay to fix the problem.
You can experiment with different delay values by changing the value at @delay:
for example to reduce it from 80 ($50) to 40 ($28):
1041:28
With that delay, some of the time I am getting A0 7F
(correct) and some of the time A0 A0
(switching too late). (The scaling of the delay value is quadratic, not linear.)
The real Apple IIe's behavior makes sense to me based on my limited understanding of how the 6502 works and how the Apple II uses it. The 6502 talks to the RAM for one half of every cycle and the video hardware talks to RAM the other half of every cycle. The load immediate instructions take four cycles. If we begin in text mode, mixed off, hires on, and assume that video hardware is beginning to scan the first pixel of the visible screen, and we consider an instruction like ldx $C050
executing at memory location $1047, then the sequence of events as I understand it is:
CPU Cycle Video hardware
--- ----- --------------
CPU places the PC value ($1047) on the 0
address bus, computes PC=PC+1, and
fetches the value from $1047 - the ldx
opcode, AE - into the predecode
register.
0.5 Video hardware places the first address
of the text screen ($400) on the address
bus, fetches the value, and displays the
pixels for that byte by looking them up
in the character generator ROM.
CPU places the PC value ($1048) on the 1
address bus, transfers the predecode
register to the instruction register,
computes PC=PC+1, and fetches the value
from $1048 - the low byte of the
operand, 50 - into the input data latch.
1.5 Video hardware places the next address
of the text screen ($401) on the address
bus, fetches the value, and displays the
pixels for that byte by looking them up
in the character generator ROM.
CPU places the PC value ($1049) on the 2
address bus, transfers the input latch
to the B register, computes PC=PC+1,
adds 0 to B, fetches the value from
$1049 - the high byte of the operand, C0
- into the input data latch, and
captures adder output into the adder
hold register.
2.5 Video hardware places the next address
of the text screen ($402) on the address
bus, fetches the value, and displays the
pixels for that byte by looking them up
in the character generator ROM.
CPU would ordinarily place the values 3
from the input data latch and adder hold
registers ($C050) on the address bus but
because that's a soft switch not mapped
to actual memory it somehow skips that
step, leaving the address bus set to its
previous value ($402). CPU fetches the
value from $402 - the last displayed
character on the text screen - into the
input data latch.
3.5 The act of mentioning the soft switch
address $C050 has caused the video
hardware to be in graphics mode now.
Video hardware places the next address
of the hires screen ($2003) on the
address bus, fetches the value, and
displays the pixels for that byte
directly.
At the start of the next instruction, 4
CPU places the PC value ($104A) on the
address bus and transfers the input
latch to the X register.
CPU would ordinarily place the values from the input data latch and adder hold registers ($C050) on the address bus but because that's a soft switch not mapped to actual memory it somehow skips that step, leaving the address bus set to its previous value ($402).
Probably the CPU does not in fact skip setting up the address bus, but instead somehow the address bus does not influence the data bus for soft-switch addresses.
I don't have a real Apple II plus available for testing. It is possible that behavior differs between a II plus and a IIe. Don Lancaster describes this in his book Enhancing Your Apple II and IIe Volume 2, Enhancement 13, The Vaporlock:
Page 206:
8AFF: 166 ; The FIX2+ routine provides one extra
8AFF: 166 ; delay cycle to adjust for screen switching
8AFF: 166 ; differences between the IIe and II+.
Page 208:
8BAB:A9 06 265 FIX2+ LDA #IDBYTE ; ADD ONE EXTRA CYCLE ONLY ON
8BAD:CD B3 FB 266 CMP VERSION ; THE II+ TO EQUALIZE ON-SCREEN
8BB0:D0 00 8BB2 267 BNE SHOW ; DISPLAY MODE SWITCHING
8BB2:2C 20 C0 269 SHOW BIT SNIFF ; OPTIONAL MODE CHANGES GO HERE
The implementation error here is that:
- video is output just-in-time, i.e. if the machine doesn't do anything to affect video state (either by soft switches or by writing to the active video area) then no work is done to produce video;
- acts such as changing mode because the processor hit C050 are enqueued to occur two cycles after whenever the CPU did them;
- vapour reads are implemented via video lookahead on the assumption that it's usually not intentional, so not worth incurring the cost of doing the actual video work; and
- bug here: that lookahead fails to take deferred actions into account.
The quick-hack fix is to modify line 745 of AppleII.cpp so that its section reads:
if(isReadOperation(operation) && address != 0xc000) {
update_video();
*value = video_.get_last_read_value(cycles_since_video_update_);
}
i.e. add an update_video
. Then, as if by magic:
The real fix will be marginally more involved, either:
- adding a means of lookahead to
DeferredQueuePerformer
and maintaining a temporary copy ofSwitches
withinVideoBase::get_last_read_value
; or - eliminating the optimisation of treating vapour reads as lookahead and hence eliminating
offset
as an argument toget_last_read_value
and proceeding as if it were0
.
I'll try to figure out whether the whole lookahead thing is actually saving any real costs in order to pick a route. Quite possibly it's not.