/CrashCatcher

Catch Hard Faults on Cortex-M devices and save out a crash dump to be used by CrashDebug.

Primary LanguageC++Apache License 2.0Apache-2.0

Table of Contents

Overview
How To Use Crash Dumps
How it Works
CrashCatcher Libraries
Dump Format
How to Clone
How to Build

Overview

CrashCatcher is code which can be included in Cortex-M microcontroller firmware to catch Hard Faults and generate crash dumps to be used by CrashDebug for post-mortem debugging. This can be useful:

  • for diagnosing crashes when it isn't convenient to have a debugger attached.
  • for capturing faults in the field to be sent back to developers for further investigation.
  • for capturing faults during testing to be further investigated at a later point in time.
  • for attaching to bug reports to provide more information about an issue being encountered.

The core code in CrashCatcher knows how to capture the register and RAM state at the time of a crash. It will then call developer provided functions to dump data in a convenient format. This could be dumping the data to a serial port or generating a file on a SD card or LocalFileSystem (for mbed devices which have such built-in storage.)

How To Use Crash Dumps

The crash dumps generated by CrashCatcher are to be loaded and used with the CrashDebug utility. More information on how CrashCatcher dumps are to be used with CrashDebug can be found here.

How it Works

In addition to the documentation below you can also check out Cyril Fougeray's great blog post where he describes using the CrashCatcher library as a part of his logging solution.

Catching the Hard Fault

A typical CMSIS based build system for Cortex-M parts will have a hard fault interrupt vector which calls a function with the following prototype:

extern "C" void HardFault_Handler(void)

The default CMSIS HardFault_Handler() is a weak function that typically does nothing more than infinite loop. However the developer can link in a HardFault_Handler() function that does more. The CrashCatcher library provides two assembly language versions of this HardFault_Handler() function, one for ARMv6-M cores and another for ARMv7-M cores.

https://github.com/adamgreen/CrashCatcher/blob/master/Core/src/CrashCatcher_armv6m.S
https://github.com/adamgreen/CrashCatcher/blob/master/Core/src/CrashCatcher_armv7m.S

These assembly language routines switch to using a dedicated g_crashCatcherStack stack (incase the crash was caused by stack corruption) and then it pushes registers to this stack before calling CrashCatcher_Entry(). Cortex-M processors will automatically push r0-r3, r12, lr, pc, and xpsr to the stack before running the HardFault_Handler() code. The assembly language routines therefore only need to push the rest of the registers.

CrashCatcher_Entry()

The CrashCatcher_Entry() routine is the core crash dumping routine and can be found here. It is written in C and knows how to dump the state of Cortex-M0/M3/M4 processors using the registers that were stacked before it was called. The HardFault_Handler() assembly language code passes in a pointer to where it stacked these registers of interest. How CrashCatcher_Entry() presents the crash dump will depend on a developer's particular hardware setup. For example, the developer may want it to be dumped to a serial interface or stored on some nonvolatile storage medium. As CrashCatcher gathers the crash state, it will call routines provided by the developer to actually dump this state.

Developer Provided Routines

This section describes the routines that the CrashCatcher core expects to be provided by the developer to facilitate crash dumping on their particular hardware setup. There is more information in the main CrashCatcher header file.

Core Routines

The following table lists the routines that are required by the Core CrashCatcher module to generate a crash dump. Implementing these functions provide the developer with the most flexibility for where the crash dump will be recorded.

CrashCatcher_DumpStart() Called at the beginning of a crash dump. The developer should provide an implementation which prepares for the dump by opening a dump file, prompting the user, or whatever makes sense for their scenario. A pointer to a CrashCatcherInfo structure is passed into this function. This lets the developer know things about the current fault such as whether it was caused by a hard coded breakpoint instruction and the current value of the Stack Pointer.
CrashCatcher_GetMemoryRegions() Called to obtain an array of regions in memory that should be dumped as part of the crash. This will typically be all RAM regions that contain volatile data. For some crash scenarios, a developer may decide to also add peripheral registers of interest (ie. dump some ethernet registers when encountering crashes in the network stack.) If NULL is returned from this function, the core will only dump the registers. A developer might want to take advantage of the Stack Pointer value passed into CrashCatcher_DumpStart() to only return a region for the currently used stack. This would result in a mini dump where only the call stack and local variables can be accessed by GDB (globals wouldn't be accessible.)
CrashCatcher_DumpMemory() Called to dump the next chunk of memory (this memory may contain the contents of registers which have already been copied to memory by CrashCatcher.) The element size will be 8-bits, 16-bits, or 32-bits. The implementation should use reads of the specified size since some memory locations may only support reads of the indicated sized.
CrashCatcher_DumpEnd() Called at the end of a crash dump. The developer should provide an implementation which cleans up at the end of dump. This could include closing a dump file, blinking LEDs, and/or infinite looping. It is typically only safe to return CRASH_CATCHER_EXIT if the call to CrashCatcher_DumpStart() indicates that the cause of the fault is a hard coded breakpoint.

HexDump Routines

Often one of the first peripherals that developers get up and running on their hardware is the UART. The above list of functions may seem like a lot for a developer to implement when the only mechanism they have for communicating the crash information to the user is the UART. CrashCatcher provides a HexDump module which makes it easier for a developer to utilize their existing UART driver software for capturing the crash dump. The following table shows the simpler list of routines that have to be provided by a developer if they are using the HexDump module:

CrashCatcher_GetMemoryRegions() Called to obtain an array of regions in memory that should be dumped as part of the crash. This will typically be all RAM regions that contain volatile data. For some crash scenarios, a user may decide to also add peripheral registers of interest (ie. dump some ethernet registers when you are encountering crashes in the network stack.) If NULL is returned from this function, the core will only dump the registers.
CrashCatcher_getc() Called to receive a character of data from the user. Typically this is in response to a "Press any key" type of prompt to the user. This function should be blocking.
CrashCatcher_putc() Called to send a character of hex dump data to the user.

The following is an excerpt of what the HexDump module would output when a crash is encountered. It first notifies the user that a crash has been encountered and then prompts them to press any key to start the dumping process. Once the user sends any keystroke to the device, the hexadecimal dump of text begins. At the end it loops and prompts the user again, incase they missed some data from the previous dump.

CRASH ENCOUNTERED
Enable logging and then press any key to start dump.

63430300
00000000
00000000010000000200000003000000
04000000050000000600000007000000
08000000090000000A0000000B000000
0C000000
687F0010
FFFFFFFF0003000000000001
487F00107809001003000000
0000001000800010
00BE0AE00D782D0668400824400000D3
...
28ED00E03CED00E0
00000100000000400800000034ED00E0
38ED00E0

End of dump


CRASH ENCOUNTERED
Enable logging and then press any key to start dump.

CrashCatcher Stack

When dumping the information about a crash, CrashCatcher sets the stack pointer to an area of memory reserved for this purpose. It uses its own stack as stack corruption may have been what lead to the crash in the first place. This reserved stack area is named g_crashCatcherStack and can be found in the Core/src/CrashCatcher.c source file. Its size is determined by the CRASH_CATCHER_STACK_WORD_COUNT macro in the Core/src/CrashCatcherPriv.h header.

The CrashCatcher stack must be large enough to run CrashCatcher's code, any routines you provide, and any code that your routines end up calling. The current size specified by CRASH_CATCHER_STACK_WORD_COUNT may not be enough depending on the nature of your routines. If you need a larger stack, you can modify the CRASH_CATCHER_STACK_WORD_COUNT macro in Core/src/CrashCatcherPriv.h or provide it on the command line of your C compiler and assembler (using the -DCRASH_CATCHER_STACK_WORD_COUNT=### command line option) when building the .c and S sources in this library.

When should you consider increasing the size of this stack?

  • If you ever see ACCE55ED bytes at the end of a dump, then CrashCatcher has detected that its stack was too small and you need to increase it.
  • If you get unexpected hangs or other odd behavior when attempting to write out your crash dump, then you should try increasing this value to see if it remedies the problem.

Developer Routine Examples

This CrashCatcher project includes a few examples of how to implement the above mentioned developer routines. These examples were written and tested on mbed devices.

mbed LocalFileSystem

The mbed LPC1768 and LPC11U24 devices contain a 2MB FLASH drive which is accessible from the PC as a USB mass storage device and to the Cortex-M processor itself via the LocalFileSystem driver. The LocalFileSystem sample uses this driver to create a CRASH.DMP file on the FLASH drive when a crash is encountered. The following table gives an overview of what this sample does for each of CrashCatcher's required routines:

CrashCatcher_DumpStart() It opens/creates the CRASH.DMP file. The g_coreDumpFile global stores the file handle to be used in subsequent write calls.
CrashCatcher_DumpMemory() It writes out the contents of the specified area of memory to CRASH.DMP
CrashCatcher_DumpEnd() It closes CRASH.DMP and enters an infinite loop.

StdIO based HexDump

The StdIO sample just delegates CrashCatcher calls to use the Standard C Library's routines for interacting with stdin and stdout. The following table gives an overview of what this sample does for each of CrashCatcher's required routines:

CrashCatcher_getc() Calls getchar() from the Standard C Library.
CrashCatcher_putc() Calls putchar() from the Standard C Library.

If the developer has other routines which interact directly with the UART hardware then those could be used instead.

CrashCatcher_GetMemoryRegions() Sample

Neither the LocalFileSystem or StdIO samples mentioned above have an actual CrashCatcher_GetMemoryRegions() implementation. This was done so that they could run on devices with differing RAM layouts. The CrashApp sample has an implementation of this routine which returns an array of memory regions to dump for NXP LPC1768 or LPC11U24 devices.

CrashCatcher Libraries

When you build CrashCatcher it will produce libraries for the ARMv6-M (Cortex-M0) and ARMv7-M (Cortex-M3, Cortex-M4, etc) architectures. The following sections provide an overview for each of these libraries, including a list of functions that the developer is expected to provide themselves for the library to function properly.

Cortex-M0

Library Description Developer Provided Functions
/lib/armv6-m/libCrashCatcher_armv6m.a Core functionality only CrashCatcher_DumpStart()
CrashCatcher_GetMemoryRegions()
CrashCatcher_DumpMemory()
CrashCatcher_DumpEnd()
/lib/armv6-m/libCrashCatcher_HexDump_armv6m.a Hex formatted dump CrashCatcher_GetMemoryRegions()
CrashCatcher_getc()
CrashCatcher_putc()
/lib/armv6-m/libCrashCatcher_LocalFileSystem_armv6m.a mbed-LPC11U24 LocalFileSystem example CrashCatcher_GetMemoryRegions()
/lib/armv6-m/libCrashCatcher_StdIO_armv6m.a Newlib stdin/stdout example CrashCatcher_GetMemoryRegions()

Cortex-M3/M4

Library Description Developer Provided Functions
/lib/armv7-m/libCrashCatcher_armv7m.a Core functionality only CrashCatcher_DumpStart()
CrashCatcher_GetMemoryRegions()
CrashCatcher_DumpMemory()
CrashCatcher_DumpEnd()
/lib/armv7-m/libCrashCatcher_HexDump_armv7m.a Hex formatted dump CrashCatcher_GetMemoryRegions()
CrashCatcher_getc()
CrashCatcher_putc()
/lib/armv7-m/libCrashCatcher_LocalFileSystem_armv7m.a mbed-LPC1768 LocalFileSystem example CrashCatcher_GetMemoryRegions()
/lib/armv7-m/libCrashCatcher_StdIO_armv7m.a Newlib stdin/stdout example CrashCatcher_GetMemoryRegions()

Linking to CrashCatcher Libraries

Once a developer knows which of the above libraries they want to use, they have to instruct the GNU linker to link it into their firmware. Due to how the GNU linker handles the weak references to the HardFault_Handler(), there is a trick that should be used to make sure that CrashCatcher's assembly language based version of HardFault_Handler() gets used. On the linker command line surround the references to the CrashCatcher library with -Wl,-whole-archive and -Wl,-no-whole-archive flags. The following shows an example of linking in the LocalFileSystem CrashCatcher library for a LPC1768 based sample:

arm-none-eabi-g++ -mcpu=cortex-m3 -mthumb -Wl,--gc-sections -TLPC1768.ld main.o -Wl,-whole-archive /lib/armv7-m/libCrashCatcher_LocalFileSystem_armv7m.a -Wl,-no-whole-archive -o LPC1768/HelloWorld.elf

Dump Format

The following sections describe the bytes found in the dump output. The HexDump module creates text output which contains two hexadecimal digits per byte of the dump.

Header

The first 4 bytes of a dump contain a header with the following format:

Offset Value Hex ASCII
0 CRASH_CATCHER_SIGNATURE_BYTE0 0x63 'c'
1 CRASH_CATCHER_SIGNATURE_BYTE1 0x43 'C'
2 CRASH_CATCHER_VERSION_MAJOR 0x03
3 CRASH_CATCHER_VERSION_MINOR 0x00

Example from a HexDump:

63430300

Flags

The next 32-bit word after the header contains flags. This word is dumped in little endian order. The currently supported flags are:

Flag Name Value Description
CRASH_CATCHER_FLAGS_FLOATING_POINT 1<<0 Flag to indicate that 32 single-precision floating point registers and FPSCR will follow integer registers.

Example from a HexDump when running on a processor which has the FPU enabled:

01000000

Integer Registers

The integer registers follow the 4-byte header. Each of the 4-byte registers are dumped in little endian byte order. They are always included in a dump and they are dumped in this order:

R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
SP
LR
PC
PSR
MSP
PSP
exceptionPSR

There are two versions of the Program Status Register, PSR, in the dump file. The first is the PSR value as seen by the code which caused the crash. The second is the PSR value for the fault handler itself.

Example from a HexDump:

00000000010000000200000003000000
04000000050000000600000007000000
08000000090000000A0000000B000000
0C000000
687F0010
FFFFFFFF0003000000000001
487F00107809001003000000

The newlines are inconsistently spaced in the HexDump output because the registers aren't all stored in one contiguous area of memory. Some are pushed automatically on the stack by the core when entering the fault handler and others are stacked by CrashCatcher's assembly language routine before passing control to CrashCatch_Entry(). This means that CrashCatcher issues more than one CrashCatcher_DumpMemory() call when dumping the registers and the HexDump module will insert a newline at the end of each such call and/or every 16 bytes.

Floating Point Registers

The floating point registers follow the integer registers in the dump if the CRASH_CATCHER_FLAGS_FLOATING_POINT flag was set. Each of these 4-byte registers are dumped in little endian byte order. They are dumped in this order:

S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
S15
S16
S17
S18
S19
S20
S21
S22
S23
S24
S25
S26
S27
S28
S29
S30
S31
FPSCR

Example from a HexDump:

000080BF0000803F0000004000004040
000080400000A0400000C0400000E040
00000041000010410000204100003041
00004041000050410000604100007041
00008041000088410000904100009841
0000A0410000A8410000B0410000B841
0000C0410000C8410000D0410000D841
0000E0410000E8410000F0410000F841
8D0080B2

Memory Regions

Each memory region that is dumped starts with a 2 word header indicating the starting (inclusive) and ending (exclusive) address of the region to be dumped. It is followed by the bytes from this region of memory. The header words are dumped in little endian order. The data itself is dumped as an array of bytes.

Field Length in bytes Notes
Starting_Address 4 Little Endian
Ending_Address 4 Little Endian
Data Ending_Address - Starting_Address

Example for a HexDump:

0000001000800010
00BE0AE00D782D0668400824400000D3
5840641EFAD1491C521E002AF2D17047
...

The example memory region:
Starts at 0x00 0x00 0x00 0x10 -> 0x10000000.
Ends at 0x00 0x80 0x00 0x10 -> 0x10008000.

The 0x8000, 32k, bytes from this region then follow. If there are more bytes in the dump after the end of the memory region data then it is the start of another memory region dump.

Fault Status Registers

On ARMv7-M processors, there are fault registers which give more information about the cause of a fault. CrashCatcher will automatically dump these registers as a memory region as well when it detects that it isn't running on a ARMv6-M processor.

Example from the end of a HexDump:

28ED00E03CED00E0
00000100000000400800000034ED00E0
38ED00E0

These 5 fault related registers start at an address of 0xE000ED28.

Register Name Address
Configurable Status Register 0xE000ED28
HardFault Status Register 0xE000ED2C
Debug Fault Status Register 0xE000ED30
MemManage Fault Address Register 0xE000ED34
BusFault Address Register 0xE000ED38

How to Clone

This project uses submodules (ie. CppUTest). Cloning therefore requires a few more steps to get all of the necessary code.

git clone --recursive git@github.com:adamgreen/CrashCatcher.git

- or -

git clone git@github.com:adamgreen/CrashCatcher.git
cd CrashCatcher
git submodule init
git submodule update

How to Build

Toolchain

  • ARM: Only the GNU Tools for ARM Embedded Processors are required for building the code to run on ARM Cortex-M processors.
  • Unit tests: If you are going to be modifying the code and want to run the unit tests then you will also need a gcc or clang toolchain appropriate for your platform:

Makefile

CrashCatcher has a makefile at the root of the project. It supports these top level targets:

  • arm: This builds the ARMv6-M and ARMv7-M versions of the CrashCatcher code. This is the default target if no other is provided to make.
  • all: This builds the CrashCatcher code for ARM targets and the host build environment for unit testing. It also executes the unit tests on the host and reports the test results.
  • clean: Cleans up all ouptut files from any previous builds. This forces everything to be rebuilt.
  • gcov: Like the all target, this builds all of the CrashCatcher code and runs the unit tests but it also instruments the binaries with code coverage tracking and then reports the code coverage obtained from executing these unit tests on the host build environment.

Example:
make all - Build CrashCatcher by just rebuilding what has changed since the last build and then rerun the unit tests.
make clean all - Build CrashCatcher by rebuilding everything.

The build has been tested on the following operating systems:

  • macOS High Sierra
  • Windows 10
  • Ubuntu 16.04