MLXXXp/Arduboy2

Save 10 bytes

Opened this issue · 5 comments

SPItransfer(pgm_read_byte(lcdBootProgram + i));

save 10 bytes by using:

SPDR = pgm_read_byte(lcdBootProgram + i);
while (!(SPSR & _BV(SPIF))); // wait

My question is: What mechanism causes the savings? It's counterintuitive that, more or less, duplicating code that is already provided in a function would reduce code size. The only difference is that the function includes an extra nop. If SPItransfer() isn't used anywhere else, or it's smaller than what it would take to call it, then the compiler should inline it. Otherwise, it should use a call. At most the change should only save the 2 bytes that the nop takes.

I tried the change with 10 Arduboy sketches chosen more or less randomly. With 6 of them the size didn't change, 2 of them saved 2 bytes, and 2 saved 10 bytes. So, there's some complex optimisation going on.

Unless it can be shown that future releases of the compiler, with different/improved optimisation schemes, will always result in equal or less code with this change, then I'm hesitant to implement it, especially since it could mean more maintenance and portability issues due to duplicating hardware specific code.

the code

SPDR = data;
while (!(SPSR & _BV(SPIF)));

compiles to 6 bytes of code. A call takes 4 bytes so it's a 2 bytes more. But the advantage with inline code is that the compiler is more flexible in choosing a general purpose register and can do better optimization. When a function is involved parameters are passed on in fixed registers. Making it less flexible.

I've noticed that the compiler is trying to be smart with for/while loops with functions and pointers involved and often ends up using r16,r17 and r28,r29 even though it doesn't really need to. These registes are preserved using the stack and that's where the code increase/optimization comes from.

I can't make any predictions of future compilers. but if code size would be increased it would likely not be limited to this piece of code.

While doing dissassemly compares I noticed I could optimize a few more bytes out and will do a assembly optimized version.

I can't make any predictions of future compilers. but if code size would be increased it would likely not be limited to this piece of code.

What I'm saying is that without this change, code size might decrease in the future. As you say, your new code takes 6 bytes. A call would take 4 bytes after setting up the parameter to be passed. If the compiler gets smarter with it's register handling in the future, the saving of registers to the stack may be eliminated, resulting in the original code compiling to less size than the new code.

Here's a C+assembly hybrid version that save ~10 ~20 bytes.

void Arduboy2Core::bootOLED()
{
    // reset the display
    uint8_t cmd;
    const void* ptr = lcdBootProgram;
    asm volatile(
    "1:                               \n\t" //assembly loop for 2nd delayShort(5)
    );
    delayShort(5);                          //for a short active low reset pulse
    asm volatile(
    "    sbic %[rst_port], %[rst_bit] \n\t" //continue if reset is active
    "    rjmp 2f                      \n\t" //else break
    "    sbi  %[rst_port], %[rst_bit] \n\t" //deactivate reset
    "    rjmp 1b                      \n\t" //loop for a recover from reset delay
    "2:                              \n\t"
    :
    : [rst_port] "I" (_SFR_IO_ADDR(RST_PORT)),
      [rst_bit]  "I" (RST_BIT)
    :
  );
  bitClear(CS_PORT, CS_BIT);               // select the display (permanently, since nothing else is using SPI)
  LCDCommandMode();
  asm volatile(
    "    ldi  r25, %[size]           \n\t" // for (uint8_t i = 0; i < sizeof(lcdBootProgram); i++) 
    "3:                              \n\t" // {
    "    lpm  %[cmd], Z+             \n\t" //   cmd = pgm_read_byte(lcdBootProgram + i));
    : [ptr] "+z" (ptr),
      [cmd] "=r" (cmd)
    : [size] "I" (sizeof(lcdBootProgram))
    : "r25"
  );    
  SPItransfer(cmd);                        //   transfer display command
  asm volatile(
    "    dec  r25                    \n\t" // }
    "    brne 3b                     \n\t"
    :
    :
    : "r25"
  );
  LCDDataMode();
}

EDIT: fixed Z pointer assembly operand in above code

@MrBlinky
since the release of the FX units use SPI storage - is this still compatible?