Doesn't run with compiler optimisations

Question

Doesn't run with compiler optimisations

Closed this issue 2 years ago · 10 comments

eufrizz commented 2 years ago

I've noticed when flashing the release version of the code, the ESC can't spin the motor. I've tried -O1, -O2 and -O3, and only O1 works. Usage of -O3 is generally discouraged, but it would be nice to run at -O2 for full optimisation and less likelihood of reentering the control block etc. Do you have any idea why -O2 might not be working? I am able to communicate via CAN but can't get it to calibrate or the motor to spin. I'm flashing via J-Link.
Additionally, for debug, I've found (sorry if you already knew this but I thought I'd record them anyway):

Using -g2 is much more useful for debugging via the J-Link, and actually shows the local variables whereas -g1 does not.
The debug optimisation flag -Og doesn't seem to make a difference compared to the -O1 we already use for debug.
No optimisation (-O0) results in a compiler error as the resulting binary is probably too large to fit on memory.
As a result I think it would be worthwhile changing the debug flag to -g2.

Finally, with the VSCode workflow, I noticed that with my PR (#207) that it wasn't actually rebuilding when switching to release, so I was unknowingly flashing in debug the whole time (which is why it worked). I think this is what you mentioned here.
When I make clean and then build and flash release, the issue in 1 occurs. This is a bit confusing, so I propose we change the preLaunchTasks to be "Clean and Build Project (<Release/Debug>)" to force it to rebuild with the expected version (rebuilding doesn't take long anyway). What do you think?

Answer 1 · 2022-10-13T06:52:38.000Z

Hi @eufrizz ,

Do you get an error? Does calibration run? Which gcc version are you using? Have you tried the prebuilt release binaries? This is odd, as we flash release versions on every board we distribute and then test it; in this sense the release build is always verified.
Cool, thanks for sharing your findings! -g2 was the original flag, but we found that it generates linker errors with arm gcc12. We found that some loops in the nvm_save function are substituted with calls to memcpy; The function itself needs to be in RAM but memcpy resides in flash, so the linker fails. Thus we temporarily reverted to -g1 until this is further investigated.
This is a known issue, we revised the launch configurations in #213

Note that we are basing new feature PRs off the develop branch (which includes the new Avlos-based comms stack), the master branch atm only accepts hotfixes.

Answer 2 · 2022-10-13T10:25:23.000Z

Also, is there a specific reason for suggesting avoiding -O3 ? I would love to hear more.

Answer 3 · 2022-10-14T01:02:52.000Z

No errors, calibration does not run - it just stays in calibration state forever.
Didn't manage to flash the prebuilt binary via VSCode but with did it with JLinkExe at address 0x1000 and it worked.
My compiler version is arm-none-eabi-gcc (15:6.3.1+svn253039-1build1) 6.3.1 20170620. I suspect it is a compiler version problem. Can't seem to find which compiler version you use - would be good to know.
Ah ok good to know. We're definitely using different compilers then as I don't get any errors.
Didn't see that! Thanks for pointing me to it.

As for -O3, I am no expert, but from what I can see it comes with a few caveats which can actually make the program slower and the binary larger. Here is some discussion. Here is the arm gcc documentation, which states that 02 is the default. It is definitely not a clear cut case and requires some profiling/testing/digging in which is probably not worth it for this use case but is interesting!

Answer 4 · 2022-10-18T08:15:12.000Z

As a side note, there is a file called "z_testbed_fw_r5" in each release, this is a release binary that includes the bootloader, which we use for flashing and testing all R5 boards. You can try flashing this, it should work with VSCode. But smart move flashing the upgrade binary with JLink!

Answer 5 · 2022-10-18T08:17:34.000Z

In addition, I think that the issues you linked mostly refer to Cortex-A or desktop class machines that have cache, and where some features of -O3 would lead to more frequent cache invalidations, and as such slow down execution. But as you mentioned, more profiling would give a more solid answer!

Answer 6 · 2022-10-30T09:28:19.000Z

I opened #222 after doing some research re. the -g2 flag. Did you manage to address the build issue?

Answer 7 · 2022-10-31T06:58:55.000Z

Great! Didn't end up solving the issue, just sticking to compiling in -O1. Which compiler version are you using?

Answer 8 · 2022-11-02T19:12:55.000Z

I've used mainly 9-2019-q4-major in the past, but now I'm using the latest here: https://developer.arm.com/downloads/-/arm-gnu-toolchain-downloads

Answer 9 · 2022-11-24T21:38:11.000Z

I am going to close this for now, and keep #222 for further investigation in the use of debug flags. I feel that the compiler optimisation issue has been resolved and was due to compiler version? If not, feel free to reopen. Thanks again!

Answer 10 · 2024-03-01T00:35:50.000Z

@yconst after coming back to this after a long time, I've realised it was due to a firmware bug. In scheduler.c, the state variable is not marked as volatile even though it is altered in interrupts, thus in optimisations higher than -O1 cause the WaitForControlLoopInterrupt function to break.
Confirmed that this fixes my problem on a test motor.
PR submitted here: #327