ARM-software/CMSIS_5

RTX thread stacks: GDB stack unwind runs beyond stack limit

udoe opened this issue · 17 comments

udoe commented

Hi,

When gdb stops on a breakpoint in thread context and I list stack frames (via bt command, or my IDE issues a -stack-list-frames command) then gdb's stack unwind algorithm walks up the stack but does not stop at the stack top address. Instead it walks two words (8 bytes) beyond stack top. This causes gdb to issue memory reads from arbitrary (and invalid) addresses. In my specific setup this is causing serious issues which are off-topic here.

I added the command
set debug frame 1
to the gdb init sequence and captured a trace from gdb stack unwinding, see gdb_output_snippet.txt

My stack area is 0x20001000..0x20002000, so stack top is at address 0x20002000. The log shows that gdb reads from 0x20002004 and 0x20002008.

I suspect the issue is caused by the specific way the initial stack frame is constructed in svcRtxThreadNew. There seems to be no initial top-most frame left when the thread runs (But I'm not an expert in Cortex-M stack unwinding).

This issue does not occur with another RTOS (ThreadX). They seem to build the initial stack frame in a different way, see tx_thread_stack_build.S, and they also use a thread entry routine.

Udo

Hi Udo,

This might indeed be the way RTX5 sets up stack for threads. The first stack frame added by RTX5 contains a "return to osThreadExit. Once we unwind the call stack down into this function, there is no additional stack frame anymore. This might cause confusion.

I guess, @RobertRostohar can give us more details.

Cheers,
Jonatan

udoe commented

Hi Jonatan,

Thanks for following up. Meanwhile I implemented a workaround: At top of stack I reserve two zero-initialized words (8 bytes in total) of which RTX is not aware. This stops GDB's stack unwinding and GDB will not stumble across invalid memory addresses. However, I'm not sure if this is the correct way to build a top-most stop frame on an Cortex M stack.

Udo

Hi Udo,

I think we need to wait for Robi to check the differences between RTX5 and the ThreadX example.

By glancing over it, I recognized that ThreadX puts an additional magic LR value (return from interrupt) to the top of the stack. In case of RTX5 we just return into osThreadExit.

Perhaps the missing part in RTX5 is that GDB cannot detect osThreadExit will never return and tries to unwind another frame. We'd need to figure out what type of stack frame would cause GDB to stop the unwind process. Do you have any experience in this are to help us working on a fix?

Thanks,
Jonatan

Hi,

As you have already figured out for RTX5, when a thread starts there is no stack frame on top of the stack left and the register LR contains the function osThreadExit (a mechanism that allows self-terminating a thread when its function exits).

The value in LR points to a valid function and therefore gdb does not stop stack unwinding.

ThreadX uses 0xFFFFFFFF (EXC_RETURN) for LR which seems to stop the gdb's stack unwind algorithm.

Putting two zero-initialized words on top of the stack also seems to stop the unwind process.

We will investigate if there is a better method in regards to osThreadExit that would not cause issues with stack unwind. But in the mean time adding the two zero initialized words on top of the stack can be an alternative.

udoe commented

Hi Jonatan, Hi Robert,

I downloaded the GDB sources from and spent some time with analyzing the code to get an idea how stack unwinding works. In case you want to take a look as well, here are some starting points:

The command -stack-list-frames seems to be implemented in mi_cmd_stack_list_frames() in binutils-gdb/gdb/mi/mi-cmd-stack.c

The functions get_current_frame() and get_prev_frame() in binutils-gdb/gdb/frame.c do the actual unwinding. At the end of get_prev_frame() you can see that a PC value of zero stops the process. Maybe this is why the place-two-zero-initialized-words workaround works.

I did not come across code that checks LR for the EXC_RETURN magic value. Maybe I overlooked it, or it is implemented in some arm-specific parts of the sources, or in some assembly code...

Note: I found the GDB command "set debug frame 1" helpful. It enables the frame_debug_xxx print statements. See also my first post.

Udo

Hi,

Just want to inform you that we are working on this issue. It would be helpful to have a project and steps to reproduce the problem. Is this possible for you @udoe ?

Regards,
Domen

udoe commented

Hi,

Yes, I can create an MCUXpresso demo project for you. Which eval board do you have available? RT1060? RT1170?

Udo

Great, I have RT 1060 available. Can I ask what tool do you use to connect with board on one side and GDB on another? I was using OpenOCD so far.

Domen

udoe commented

Okay, I will prepare a test project for the RT1060 EVK. But this might not be available before the end of next week (Oct14), sorry.

I use MCUXpresso (which has OpenOCD built in I think) plus Segger J-Link. But the EVK has an integrated debug adapter which connects with MCUXpresso directly. No extra tools are needed.

Udo

udoe commented

Hi Domen,

I was able to reproduce the issue using MCUXpresso and the RT1060EVK board with integrated debug adapter (DAP). I attached my demo project as evkbmimxrt1060_igpio_led_output.zip.

Here you can see how the stack unwinding yields an invalid address 0xeeeeeee when it reaches top of stack. This address is taken from the memory area beyond top of stack.

stack_unwind1

This is a memory dump of the stack area:

stack_mem

The stack area passed to RTX ends at 0x20001FF0. The unwinding algorithm learns the 0xeeeeeee address from the reserved area after the stack space.

In the LinkServer debug log you can see that it rejects access to the invalid address:
Em(12). Target rejected debug access at location...
This is because in case of MCUXpresso some inaccessible memory areas are configured in gdb. However, the IDE I'm using (VisualGDB) does not do that and when gdb executes the invalid access this locks up the internal bus of the MCU (RT1170 in my case) which is fatal.

SEGGER has confirmed that an access to invalid memory addresses can lock up the CPU core, see also
https://forum.segger.com/index.php/Thread/8723-i-MXRT1176-Reading-from-invalid-memory-address-breaks-all-subsequent-reads/

Steps to reproduce the problem in MCUXpresso:

  • In the Launch configuration, LinkServer Debugger page, set Debug level to 4.
  • Set a breakpoint in the MainThread while loop.
  • Launch the debugger and let it run until the breakpoint is hit.
  • Set the Console window to "evkbmimxrt1060_igpio_led_output Debug messages" and click on Pin Console.
  • Clear the console and hit F8 to execute one round in the thread's loop. In the console Windows you can see the "Target rejected debug access at location 0xEEEEEEEC" messages. Note also that the stack back trace window shows 0xeeeeeeee at the bottom.
  • You can open a second console and set it to "evkbmimxrt1060_igpio_led_output Debug messages [...] gdb traces". Here you can watch the communication between IDE and gdb.

Note: I found that with the gdb version that comes with MCUXpresso the command
set debug frame 1
I mentiond above does not work. It leads to an internal error (assertion failure) in gdb. So you cannot use it to make the stack back tracing routines print more detailed info. With an older gdb (8.3.0.20190709) I was using originally, the command was working.

If you need assistance with MCUXpresso, let me know. It's not always easy to use when it comes to the non-trivial stuff.

Best regards,
Udo

evkbmimxrt1060_igpio_led_output.zip

udoe commented

Hello,

Is there any news? Did you receive my demo project?

Udo

Hi Udo,

I received your project and was able to reproduce. Few solutions were implemented, but problem was not solved.

  • RTX uses a trick to support self-terminating a thread and we assumed that this might confuse GDB. Removing that does not help.
  • When a thread is created and prepared to be launched, we prepared the basic stack frame R0..R3, R12, LR, PC, xPSR. We set LR to osThreadExit (self-teminate) and PC to the thread entry function. Setting LR to a different value (0 or 0xFFFFFFFF, 0xFFFFFFFD) did not make any difference.
  • GDB source code (frame.c/get_prev_frame() ) shows that if variable frame_pc equals 0 stops the process, which means illegal address, so this is also not relevant.
  • Solution with uVision debugger to stop the unwinding is implemented with Stop-symbols which are known to exist in debug info of loaded app.

We also could not duplicate your solution of using two zero-initialized word on top of stack to stop the unwinder, hope you could tell us how to replicate this too.

Regards, Domen

udoe commented

Hi Domen,

I was able to replicate my workaround in the demo project. For this, I changed the thread setup code as follows:

    osThreadAttr_t tattr;
    memset(&tattr, 0, sizeof(tattr));
    tattr.name = "MAIN";
    tattr.priority = osPriorityNormal;
    tattr.stack_mem = gThreadStack;
    // To demonstrate the problem, preserve 16 bytes filled with 0xEE at top of stack.
    tattr.stack_size = sizeof(gThreadStack) - 16;
#if 1
    // Workaround: Place two zero-initialized words at top of stack.
    tattr.stack_size -= 8;
    uint32_t* p = (uint32_t*)&gThreadStack[tattr.stack_size];
    p[0] = 0;
    p[1] = 0;
#endif
    osThreadNew(MainThread, NULL, &tattr);    // Create application main thread

When the breakpoint is hit, this is the result:

stack_unwind_2

Note that gdb now shows 0x0 at top of stack (red markers) and that the "Em(12). Target rejected debug access at location..." messages disappear in the evkbmimxrt1060_igpio_led_output Debug messages log (blue marker).

As for the self-terminating behavior: Can't you achieve this by using an internal thread entry function which looks like this:

void ThreadEntry(osThreadFunc_t func, void *argument)
{
    (func)(argument);
    osThreadExit();
}

Best regards,
Udo

udoe commented

Hi again,

I also tried a simple AzureRTOS sample which comes with the NXP SDK:
Import SDK examples --> azure_rtos_examples --> threadx_demo

Here is a screen shot from the IDE when a breakpoint in thread_0_entry is hit:

azrtos_breakpoint

You can see that the backtrace stops at 0xfffffffe which is the magic LR value. No invalid memory access occurs.

Udo

udoe commented

Hello,

May I ask for a status update? If I can provide further info, let me know.

Udo

Hi Udo,

Sorry for the longer response time.

We have analyzed the issue again and tested the possible solutions.

Using a thread entry wrapper (with the self-terminating behavior) sounds the optimal solution in this case (rather then increasing stack with 8-bytes). The wrapper should however also not use any stack (probably implemented as inline ASM).

We plan to add the thread entry wrapper in the coming weeks.

Thanks for all you help!

Thread entry wrapper has been added. The problem with GCC stack unwind should be resolved now.