alekmaul/pvsneslib

deadlock in Ricoh/SPC700 communication

jeffythedragonslayer opened this issue · 3 comments

I've observed the Ricoh CPU get stuck in an infinite loop (in Mesen 2) here, at "sync with spc:"

@next_block:
		
	lda	[digi_src2], y
	sta	spc2
	rep	#20h			; read 2 bytes
	lda	[digi_src], y		;
-:	cpx	REG_APUIO0		;-sync with spc
	bne	-			;
	inx				; increment v
	sta	REG_APUIO2		; write 2 bytes
	sep	#20h			;
	lda	spc2			; copy third byte
	sta	REG_APUIO1		;
	stx	REG_APUIO0		; send data
	iny				; increment pointer
	iny				;
	iny				;
	dec	spc1			; decrement block counter
	bne	@next_block		;

with X=$97 (8-bit index register) and APUIO0 = $00. During this time, the SPC700 was stuck in an infinite loop here:

1871	CMP $F4, #$80 [CPUIO0] = $97
1874	BCS $1871

with $00F4 = $80, which is "wait for snes" here in sm_spc.as7:

;**************************************************************************************
;* UPDATE STREAM
;**************************************************************************************
Streaming_Run:
;--------------------------------------------------------------------------------------
	mov	SPC_PORT0, #80h		; respond to SNES
;--------------------------------------------------------------------------------------
	push	a			; preserve regs
	push	x			;
	push	y			;
;--------------------------------------------------------------------------------------
_srw1:	cmp	SPC_PORT0, #80h		; wait for snes
	bcs	_srw1			;
;--------------------------------------------------------------------------------------
	mov	a, SPC_PORT0		; copy nchunks
	mov	stream_a, a		;
	mov	a, SPC_PORT1		; check for new note
	beq	_sr_nstart		;	
	call	Streaming_Activate	;

This situation is difficult to recreate, so I took a savestate. I don't have any steps other than "walk around aimlessly until the game hangs" and even then it's not all too often. Let me if you need the savestate and the ROM to debug.

Excellent analysis courtesy of KungFuFurby:

Currently what's running through my mind is that you got really unlucky timing-wise. At least its protocol doesn't seem to be as offal as SNESGSS's on that regard. However, you deadlocked. That means both sides stopped in place. I'm looking at all of the sound drivers. I think the problem lies in the interrupts. My guess is that spcProcessStream was interrupted at exactly the wrong time by a queue flusher of some kind that attempted to send a command to REG_APUIO0. I'm working out the cause of the deadlock at the moment. The X register is supposed to be set to digi_copyrate... at least in theory. It looks like either the accumulator or the X register was overwritten in between @copysat and @next_block and was not properly restored by the interrupt routine. The start signal on the SPC side was sent. What happened is that it is now expecting digi_copyrate from the SNES. Your X register is a bad value simply because it is a negative value. The other thing to watch out for is that you are referring to a valid array item in digi_rates. So either you ran into an interrupt situation, or you accidentally got an invalid copy rate of some kind. I'm leaning towards the former for now.

This deadlock happened again with the SPC700 in this state (CPU hung in same spot), Mesen 2 was not able to disassemble anything on that side:

image

it is more complex than that.
The loop is mandatory because CPU is waiting for SPC700 command, but SPC700 is hangs, somewhere in memory.
So, I suspect that SPC700 receive a command or something who is puting it in an indesired area.
I can reproduce it also when I use too much RAM for music or BRR (more than the 64K minus the size of SPC700 code).