majbthrd/pico-debug

pico-debug might interfere "no-flash" image

Closed this issue · 13 comments

Hi,

Recently, I have observed some strange behaviour and I suspect that the pico-debug interferes with some SRAM areas during debugging.

RP2040_debug_in_sram.zip

Here are some facts:

  • The template.uf2 is a "no-flash" image

    • Link script is attached: template.lnp
    • map file is attached: template.map
    • original elf file is attached: template.axf
    • It is compiled with Arm Compiler 6.16
  • It works well using the normal drag&drop method

  • When I debug it with pico-debug (both pico-debug-gimmecache.uf2 and pico-debug-maxram.uf2 have been tried), the image crash into hardfault.

  • around memory address 0x2000AC7C, the content is changed. And this area is supposed to be ".text". The map file can verify this.

The project is compiled and debugged in MDK. I have created a tag for you to replicate this issue.

Since the image is smaller than 16KByte, you can use the Evaluation license of MDK to test it.

Here are the steps to replicate the issue in MDK:

  1. Open the project in path "project/MDK"
  2. (if it is not the currently activated project configuration) switch to the project configuration "AC6-DebugInSRAM"
  3. Compile
  4. Press the debug button to start the debug session.
  5. You should see a hardfault.

NOTE: Please use the latest MDK (5.35 or later) to test it.

I need your help.

I really appreciate any help you can provide.

Cheers,
Gabriel

I have tried two Picos bought from different vendors, so it is highly likely not an HW specific issue.

Find a workaround, but still not sure about the reason behind it.

I went through the process of downloading MDK, installing it, etc. However, when I got through all of that, the example code that you provided was unusable because it has a unmet dependency on a object file called "perf_counter.lib" as well as a similarly named include file.

I used another tool to example the template.axf attachment. However, it is not clear that this is the same code as what you are talking about, since the only section anywhere near the 0x2000AC7C that you report is a section called "RW_IRAM_CODE" that exists from 0x20009D0C to 0x2000A14C.

Looking at your "startup_RP2040.c", what pops out as potentially problematic is the entry function. Because it is written as C, the very first instruction is to modify the SP. Also, and most importantly, because the compiler assumes the SP has already been initialized, it proceeds to write and then read a value using an offset from the SP. All of this is before the SP has been set.

gorgon

In other words, it will corrupt a word-sized piece of RAM in the first few instructions.

Because your code leaves it to chance as to what the SP already happens to be, it does highlight a difference in behavior between loading the .uf2 directly and loading it via a debugger. When loading the .uf2 directly, the SP will be set to whatever the Boot ROM happens to have left it as. When using pico-debug, the SP is still set to the value pico-debug used when it was loaded via .uf2. Those two SP values are likely not the same.

Particularly when debugging, it is incumbent on the app to set registers to a known state before using them.

I'm not saying this is the only problem, but it is certainly one problem that stands out. Perhaps you can provide more complete code to test with?

the "perf_counter.lib" and "perf_counter.h" can be found here:

https://github.com/GorgonMeducer/perf_counter/releases/tag/v1.5.0

the template.uf2 is converted from template.axf via the "tool/elf2uf2.exe"

About the Reset_Handler:

__NO_RETURN void Reset_Handler(void)
{
    SCB->VTOR = (uintptr_t)__VECTOR_TABLE;
    __set_MSP((uintptr_t)(&__INITIAL_SP));
  //SystemInit();                             /* CMSIS System Initialization */
  __PROGRAM_START();                        /* Enter PreMain (C library entry point) */
}

You are right, it uses SP in -O0. But when I try to compile it with -Os, the code generation is good.

image

I don't say that this code has no problem at all. It has the problem you mentioned. I will fix it.
The key is, even in -Os, we still suffer a similar issue. That is, after click the "Reset Pico" button in the ToolBox, we will get a hardfault.

image

-- update --
Some correction:

The key is, even in -Os, we still suffer a similar issue. That is, after click the "Reset Pico" button in the ToolBox, the code will crash into some random place.

The debugger shows here and it never returns...
image

Pico_Template-pico-debug-trouble-shooting.zip

Generate a standalone MDK project package for you.
Let's use it as a start point.

Thank you so much for being so helpful.

* around memory address 0x2000AC7C, the content is changed. And this area is supposed to be ".text". The map file can verify this.

What method are you using to determine that the content has changed? Are you basing this on the Keil MDK message:

Warning: BKPT message at 0xXXXXXXXX externally modified! May have missed requested breakpoint

Warning: BKPT message at 0xXXXXXXXX externally modified! May have missed requested breakpoint

This message is caused by another reason. You can ignore it.

The way I can see the content has been changed is by using the memory viewer.

Please follow the steps list below:

  • Open the project attached before.

  • Enter debug sesison

  • Set a break point at runtime_init();
    image

  • Open memory view
    image

  • Press Run (F5), it will stop at runtime_init();
    image

  • Press the "Reset Pico" button

  • Run (F5)
    image

You can clearly see that the content has been changed.

Since we never start running the pico-sdk, and only go through the __PROGRAM_START() (i.e. __main()) which leads to ARM Compiler C Library Startup and Initialization:

https://developer.arm.com/documentation/dai0241/b

There are two possible reasons behind the issue we encountered:

  • Something wrong with the ARM Compiler C Library Startup and Initialization. It could be the scatter-loading or some function declared with __attribute__((constructor)).
    or
  • external bus master which changed the content in the memory location.

I need your help on the second possibility and I am working on the first possibility.

Thank you.

By the way, the content change can be clearly observed when setting the optimisation level to -O0.
And with the project I attached, you can clearly see that the SP corruption issue has been solved:

image

So it won't be the case. When I first report this issue, all the snapshot and memory addresses are observed with -O0.

The value just after debug session started:
image

The value after a Pico-Reset:
image

At the risk of stating something already obvious to you, there is also a project called DapperMime:

https://github.com/majbthrd/DapperMime

It is almost entirely the same code as pico-debug. The difference to pico-debug is that two RP2040 boards are needed for DapperMime. One RP2040 has the DapperMime .uf2 programmed to flash and it is wired to the second RP2040’s SWDIO, SWCLK, and GND pins.

I find it helpful to compare behavior between pico-debug and DapperMime (as well as other debug adapters if those are available).

I’m concerned that this problem may have something to do with a bus starvation issue between Core 0, Core 1, and the APB DP (debug port) on Core 0, rather than an errant memory pointer. I am going to continue to investigate the problem, but you shouldn’t bank on a quick fix.

I’m mentioning DapperMime because it will also work with Keil MDK (if you have two or more RP2040 boards… which you have already said you have) and having both is helpful for diagnostics to compare their behaviors (since the code is largely identical).

Certainly, using DapperMime would help you sort issues (if any) with ARM initialization code.

Alas, there is a roadblock to using DapperMime with Keil MDK. Despite Keil being owned by ARM, it doesn't support multi-drop SWD targets! I have what should be a workaround patch for this below (and I can see with a logic analyzer that Keil MDK correctly reads the IDCODE and starts setting up debugging), but Keil MDK eventually gives up with an "Invalid ROM Table". There are only so many hours in the day and I've run out of mine, but I am sharing what I have at this time.

diff --git a/bsp/rp2040/DAP_config.h b/bsp/rp2040/DAP_config.h
index 14bc40a..aa24233 100644
--- a/bsp/rp2040/DAP_config.h
+++ b/bsp/rp2040/DAP_config.h
@@ -235,6 +235,29 @@ __STATIC_INLINE void PORT_SWD_SETUP (void) {
   hw_write_masked(&padsbank0_hw->io[PROBE_PIN_SWDIO], PADS_BANK0_GPIO0_IE_BITS, PADS_BANK0_GPIO0_IE_BITS | PADS_BANK0_GPIO0_OD_BITS);
   iobank0_hw->io[PROBE_PIN_SWCLK].ctrl = GPIO_FUNC_SIO << IO_BANK0_GPIO0_CTRL_FUNCSEL_LSB;
   iobank0_hw->io[PROBE_PIN_SWDIO].ctrl = GPIO_FUNC_SIO << IO_BANK0_GPIO0_CTRL_FUNCSEL_LSB;
+
+#if 1
+  /* this #if block is for Keil MDK which (despite being an ARM company) can't handle multi-drop targets */
+
+  static const uint8_t sequence_alert[] = {
+    0xff, 0x92, 0xf3, 0x09, 0x62, 0x95, 0x2d, 0x85, 0x86, 0xe9, 0xaf, 0xdd, 0xe3, 0xa2, 0x0e, 0xbc,
+    0x19, 0xa0, 0xf1, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0xff, 0xff, 0xff, 0xff,
+    0xff, 0xff, 0xff, 0x00,
+  };
+  SWJ_Sequence(8*sizeof(sequence_alert), sequence_alert);
+
+  static const uint8_t packet_request[] = { 0x99 };
+  uint8_t trash;
+  static const uint8_t choose_target[] = { 0x27, 0x29, 0x00, 0x01, 0x00 };
+
+  SWD_Sequence(0x08, packet_request, NULL);
+  SWD_Sequence(0x85, NULL, &trash);
+  SWD_Sequence(0x21, choose_target, NULL);
+
+  /* set to default high level */
+  sio_hw->gpio_oe_set = PROBE_PIN_SWCLK_MASK | PROBE_PIN_SWDIO_MASK;
+  sio_hw->gpio_set = PROBE_PIN_SWCLK_MASK | PROBE_PIN_SWDIO_MASK;
+#endif
 }
 
 /** Disable JTAG/SWD I/O Pins.

Thank you for your suggestion.
I have got confirmation from the KEIL team that so far MDK debugger doesn't support multi-drop SWD, and pico-debug is the only workaround.

I have figure out a workaround for this issue, and based on that, I start to think maybe it is a scatter-loading issue.

Here is my solution:

Change the scatter-script from the one you currently see to this:

https://github.com/GorgonMeducer/Pico_Template/blob/main/project/mdk/RP2040_debug_in_sram.sct

The key taken away is that move the

    RW_IRAM_CODE +0 {
        * (+RO-CODE)
        * (+XO)
    }

from the tail of the RAM area to a place right behind ER_MUTEX_ARRAY.

I don't think the original one was wrong, because they are absolute execution regions.

https://developer.arm.com/documentation/101754/0616/armlink-Reference/Scatter-File-Syntax/Execution-region-descriptions/Considerations-when-using-a-relative-address--offset-for-execution-regions?lang=en

https://developer.arm.com/documentation/101754/0616/armlink-Reference/Scatter-File-Syntax/Execution-region-descriptions/Inheritance-rules-for-execution-region-address-attributes?lang=en

But maybe I am wrong.

With this workaround, both issues, i.e.

Warning: BKPT message at 0xXXXXXXXX externally modified! May have missed requested breakpoint

and the "memory content changed" issues are gone.

The BKPT issue provides me with some hint that maybe RW_IRAM_CODE +0 is not treated as an absolute region (I believe its load address and execution address are the same, hence it is an absolute region):

  • We stop at Reset_Handler and debug-script set a breakpoint on the entry of main() (placing a BKPT instruction there).
  • When we start running, the scatter-loading read code from the load region and put them to the designated execution address. This access overrides the BKPT instruction there.

The problem for the second issue is: even if RW_IRAM_CODE is treated as "ram code", there shouldn't be any problem because the normal scatter-loading should be able to load the right content to the right location.

So, I don't know why. The good news is, current workaround (i.e. use a modified scatter-script) works well. So don't worry.

Thank you for your help!