KarolS/millfork

Functions (including macros) can improperly track the contents of a register when optimizing

agg23 opened this issue · 3 comments

agg23 commented

In some situations it seems that either the optimizer is overeager or loses track of what variables are in registers when using a T*A instruction.

There may be a more minimal repo than this, but this is the best I could come up with. This one does not repro on macro (though I have experienced it elsewhere), but it does on normal functions and inline:

array(byte) output [256] @ $200
array(byte) input = [$10, $20, $30, $40, $50, $60, $70, $80, $90, $A0]
byte write_index

void main() {
  init_rw_memory()
  write_index = 0

  byte j
  bool flip
  for j,0,until,10 {
    flip = j != 5
    add_data_i(j, flip, $FF, 1, 2)
  }

  while (true) {}
}

void add_data_i(byte i, bool swap, byte first, byte data_1, byte data_2) {
  output[write_index] = first
  if (swap) {
    output[write_index + 1] = data_2
  } else {
    output[write_index + 1] = data_1
  }
  output[write_index + 2] = input[i]

  output[write_index + 3] = first
  if (swap) {
    output[write_index + 4] = data_1
  } else {
    output[write_index + 4] = data_2
  }
  output[write_index + 5] = input[i]

  write_index += 6
}

void nmi() {
  
}

void irq() {

}

Compiled with java -jar millfork.jar main.mfk -o build/rom.nes -t nes_small -g -s -fsource-in-asm -fillegals -O4 results in the following output snippet:

...
    LDX write_index
    STA $204, X
; 
;line:34:main.mfk
;     output[write_index + 5] = input[i]
    LDY add_data_i$i
    LDA input.array, Y
    STA $205, X
; 
;line
    TYA <- problematic transfer
; 
;line:36:main.mfk
;     write_index += 6
    CLC
    ADC #6
    STA write_index

...

For this particular example, the problem begins to appear at O4 and higher. In my codebase, this issue occurs at O2 and higher. For some reason at O4 the compiler thinks write_index is in Y, even though it just loaded i into Y instead.

Running at O3 or lower produces a does not load write_index in advance and inserts LDY write_index immediately before the store. Notice in the O4 version, simply substituting TXA for TYA would result in the correct output.

    STA $204, Y
; 
;line:34:main.mfk
;     output[write_index + 5] = input[i]
    LDY add_data_i$i
    LDA input.array, Y
    LDY write_index
    STA $205, Y
; 
;line
    TYA 
; 
;line:36:main.mfk
;     write_index += 6
    CLC
    ADC #6
    STA write_index

I've managed to replicate this bug in the 0.3.24 version, however I failed to replicate it in the current HEAD version. Maybe it's fixed, maybe not. The funny thing is that I have barely touched 6502 optimizations since then.

The bug was caused by the 0th case of LaterOptimizations.IndexSwitchingOptimization, but whether this is a bug with that optimization in particular, or with something else, I cannot tell right now.

Does this still happen in 0.3.28? If no one can confirm it, I'll close this.

Closing due to inability to replicate.