usbcnc/grbl

Self-compiled project works not reliable

lakeroe opened this issue · 42 comments

Hello USBCNC,

I flashed an older hex file you provided here on a blue pill STM32F103C8T6 board, installed ST's VCP driver and did a first test.
I streamed a pretty big g-code file (860.000 lines, running time ~2hours) using bCNC -> this works without a problem.

However, the same test using a self-compiled file only works for a few minutes but after some random time (between 5 and 15 minutes) it hangs.

I already tried different compiler options and optimization levels but can't get it to work reliable.
A next test is to use an older compiler. I currently use 6.3.1-2017q2 and I think you used 4.9-2015q3 as I saw here.

Do you have any further ideas ?

I'm interested in a stable and reliable GRBL STM32 port, so any help is highly appreciated.

Thanks and best regards,
lakeroe

Hello together,

after some more testing I finally got it to work.
I compiled the project using different compilers and 100% same settings.

  1. GCC V6.3.1 20170620
    Binary size: 38960 Bytes -> NOT working

  2. GCC V4.9.3 20150529
    Binary size: 38720 Bytes -> working

It would be interesting to find the root of the problem.
My best guess is some compiler optimization changes.

Let me know if you're interested in further investigation, I'm willing to help if you like !

Best regards,
lakeroe

makefile.bat

I am working on another project of STM32. I think compile option does makes difference for USB performance. In the mean time there is known issue regarding the resistor used on the board (should be 1.5K not 10k). For me it works better if I hook up with a USB 2.0 hub.

@usbcnc Did you mean the pull resistor between USB D+ and VDD ?

@radiolee Yes. The soldered resistor is 10K. It will work on some PC but not work on the others.

@usbcnc Thanks! My stm32f103c8t6 mini board is on the road. I'll check it when I get it.
BTW, what version of toolchain are you using, still 4.9 2015q3 or later?
I use 2015q3 and build with default configuration:
Program Size:
text data bss dec hex filename
37396 1324 5228 43948 abac stm32grbl.elf

@usbcnc I got the board with 1.5k pullup res.
1st run without stepper driver, looks 2x faster with same settings in cncjs. thanks for your great work!

I got my board with 10k pullups and changed them to 1.5k but the problem remains the same.
It works reliable using GCC V4.9.3 20150529 and unreliable using GCC V6.3.1 20170620.
Both using exactly the same compilation settings (see my makefile above).

Regards,
lakeroe

@lakeroe I'm not sure I saw 2016 or 2017 version @LinuxCNC used, maybe in gnea/grbl ARM discuess, but can't find it now.
Did you try CooCox IDE with latest gcc? I was try cygwin but don't know howto😌

I am not expert of tool chain. I use what works for me for now. If one works and the other does not , I can assume the other one has a bug or so. If you really want, disassemble the elf and find the difference.

I have noticed in the past some compiler will optimize differently the following code.
While(some flag!=true)
;
The code is meant to wait the flag change, the change of flag is in interrupt routine.
There are two places in serial.c that use this kind of code.
If they are improperly optimized, then serial code will not reliablely work.
Suggest to use something like #progma to disable the optimize around those two places.

O3 might work better than optimize for size.

I do have latest coocox installed but not the latest tool chain. I still prefer to use older ide since I get used to it. Do not find need to have to use the latest one.

Thanks for ur reply. I'm still use old version

There are two places in serial.c that use this kind of code.
If they are improperly optimized, then serial code will not reliablely work.
Suggest to use something like #progma to disable the optimize around those two places.

I don't use the serial interface, but USB instead. So in my understanding this should not be a problem.

O3 might work better than optimize for size.

As I wrote in my first post I already tried different optimization levels without luck ...

I guess @usbcnc point out While(some flag!=true) in serial.c maybe differently optimized by latest gcc, and the point is in interrupt routine. If some diff in interrupt, it may makes random issues.

Try to disable compile option and compare the result between the two compilers. I do not have installed the latest tool.

I'd love to find the root cause of the problem, but I think my knowledge is not enough for that.
I just compared the result between the two compilers (the .ASM files) and a simple text compare shows hundreds (or thousands) of differences. Maybe there's a better approach to find the differences ?

It should related to the serial port code a(USB). You can try to user real serial port (not USB VCP) to see if both tool works the same.

Today I could narrow down the problem and @usbcnc you're right: The problem seems related to the USB VCP code.

Real serial port (using an external USB-RS232-converter from FTDI) works fine.
USB VCP and compiler option -O0 -> works fine
USB VCP and compiler option -O1 -> works fine
USB VCP and compiler option -O2 -> connection unreliable
USB VCP and compiler option -Os -> connection unreliable

So it really seems a compiler optimization problem.

I try to further isolate the problem, but this will take some time.
If you have any further idea, please let me know ...

Have you tried O3?

Yes I tried -O3, it doesn't work either.

The problem seems to be the -fschedule-insns2 optimization flag which is automatically enabled at -O2, -O3 and -Os.
By adding -fno-schedule-insns2 in addition to -Os the code works reliable.
I already compared the two generated codes (*.ASM files), and they are quite different.

This leads to the question, what's the root of the problem:
*) a compiler bug ?
*) bad c coding style ?
*) any other ideas ?
*) ...

I would check the code generated for serial.c and usb_endp.c.
Possible reason are

  1. Synchronization.
  2. (This is a bug in all VCP code that a zero byte needs to send to host if multiple of 64 bytes are send to the host and no further IN bytes are send). This however should not be affected by the optimization level.

main.c has no issue. As I said in serial.c and usb_endp.c

Sorry, I noticed that myself and here are the correct files again:

serial-not-okay.asm
serial-ok.asm
usb_endp-not-okay.asm
usb_endp-ok.asm

I think there is a bug in the compiler.
Look at serial-not-okay.asm
1a: 70d3 strb r3, [r2, #3]
1c: 5460 strb r0, [r4, r1]

Wrong order. Should swap the lines.
But in the ok asm
1a: 5460 strb r0, [r4, r1]
1c: 70d3 strb r3, [r2, #3]

The c code is
serial_tx_buffer[serial_tx_buffer_head] = data;

serial_tx_buffer_head = next_head;

Which to me the OK one generate the right code.
You can go to binary code and modify those two places and see if the problem disappears.
The order is wrong in here. There maybe other places that the compiler generate the wrong code,

In the above code, the order is VERY important, if the order is wrong the result can be unpredictable.

Try change to this and see if the problem is fixed.
__disable_irq();
// Store data and advance head
serial_tx_buffer[serial_tx_buffer_head] = data;

serial_tx_buffer_head = next_head;
__enable_irq();

By disabling the IRQs the program works fine now.
I just ran a ~2h program without a problem.
The two lines are still in the wrong order though.

Thanks for your help !

Maybe it's a good idea to update the git sources ...

The original source is good. But the optimization has issue not the code itself.
Disable the IRQ just to prove that optimization does the wrong thing and should not be trusted.

Original GRBL is carefully written and these two lines have a good reason to be in this order. If compiler changes the order, that means it cannot be trusted.
I think there is a way to work around this issue but is it worth the effort?

Original code says "// Store data and advance head"
But the compiler made it "advance the head and store the data". Even though the data is stored in the right place, but it is in the wrong timing.

https://developer.arm.com/open-source/gnu-toolchain/gnu-rm/downloads

why not try 2017q4 or other version to check if the problem fixed?

I just tried the new gcc (2017q4) and the problem is still the same.
So in your opinion is adding the -fno-schedule-insns2 parameter the best way to solve this issue for newer compilers ?

And one more question just for my understanding:
Even if the compiler changes those two lines (which in my understanding causes the data is stored in the WRONG place) why is it working if IRQs are disabled ?

@lakeroe I think disable irq just a debug code, bypass something to test. I really don't know what's different use -fno-schedule-insns2 so can't help.

@lakeroe, I do not know your educational background. If you look at the usb_endp.c which use the flag (in ISR), you should know this order is very important. Disable the IRQ means no interrupt will happen between two lines and the problem will not happen. The problem will happen ONLY if the interrupt is happening between those two lines.

I've learnt programming microcontrollers in C 25 years ago and apart from some hobby projects my knowledge is quite basic. I'm currently working as an electronics hardware developer (mainly microcontroller circuits).

Thanks for the explanation, I think I understand the problem now.

Regarding the thread you mentioned above:
This one is from 2009 and the problem still persists in newer gcc versions - sounds very strange to me.

Anyway, my solution is to use your unchanged code and add the -fno-schedule-insns2 flag.

I have read and I do not believe this is a bug for compiler. It just means in this case, this optimization needs to be disabled for those two lines. I think there is a better way to disable this option only for those two lines. -fno-schedule-insns should go with -fno-schedule-insns2. They both do so called "instruction scheduling".

By adding "attribute((optimize("-O0")))" it should be possible to completely deactivate optimizing for a specific function, but I think that's not exactly what we want ...

@lakeroe that will do the trick and should be used. It is better than disable the IRQ.

Just for your information:
I tried the attribute thingy from above and it also works without problems ...

Do you think that should be added to your source ?

@lakeroe Yes I think so.

Just for your information:

__attribute__((optimize("-O1"))) works also fine