s-matyukevich/raspberry-pi-os

Failed to move to user mode on Raspberry Pi 4B

OscarShiang opened this issue · 3 comments

Hello,

I was trying to run lesson05 sample code on my Raspberry Pi 4B but it seems to have
some problems while switching to user mode.

Here are the things I changed to run on my board:

  • modify peripheral base to 0xFE000000
  • change AUX_MU_BAUD_REG to 541

The output I got from UART is:

Kernel process started. EL 1
io4, ESR: 2000000, address: e74

Is there any workaround for this error?

Hi @OscarShiang:

I had similar issues to you, but ended up getting this working on the Pi 4B (as well as 3B).

Before I begin my explanation, I should note that I couldn't get lesson04 to work until I edited the linker script to set the start address at 0x80000. Sergey's code as is just didn't work on either the 3B or 4B. Changing around my config.txt and setting the linker script to use 0x80000 (the Pi's default load address in 64-bit mode) made lesson04 work.

EDIT: Since I wrote my initial comment, I discovered that the current bootloader release caused the bootloader to ignore kernel_old and kernel_address options in config.txt. Using kernel_old=1 with a pre-release bootloader (see raspberrypi/firmware#1561) will make Sergey's (and my) code work starting at address 0x0.


First thing I did was look into the ESR value from the ARM documentation. I used Python to quickly determine the 0b representation of the 0x printed error:

Python 3.8.1 (default, Jan  9 2020, 16:15:08) 
[Clang 11.0.0 (clang-1100.0.33.16)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> bin(0x2000000)
'0b10000000000000000000000000'
>>> bin(0x2000000)[2:]
'10000000000000000000000000'
>>> len(bin(0x2000000)[2:])
26

The documentation isn't terribly helpful, as it gives the following possible exceptional conditions:

  • An SError interrupt.
  • An Instruction Abort exception.
  • A PC alignment fault exception.
  • An SP alignment fault exception.
  • A Data Abort exception for which the value of the ISV bit is 0.
  • An Illegal Execution state exception.
  • Any debug exception except for Breakpoint instruction exceptions. For Breakpoint instruction exceptions, this bit has its standard meaning:
    • 0b0: 16-bit T32 BKPT instruction.
    • 0b1: 32-bit A32 BKPT instruction or A64 BRK instruction.
  • An exception reported using EC value 0b000000.

It turns out, I had flipped one of my disable_irq calls in entry.S to be enable_irq. I also noticed that the handle_invalid_entry was being reused, even though its intention was to perform a context switch from one of the unused/unimplemented exception conditions (e.g. FIQ). The new exceptions we had created in lesson05 were all after a context switch had happened, so we needn't perform kernel_entry.

I refactored the handle_invalid_entry macro into two separate macros: handle_error and handle_invalid_entry. I also added a parameter to the show_invalid_entry_message function to track EL, since get_el() itself triggers an exception in EL0.

From there, I still had issues, so I began debugging. I created a DEBUG macro, and I also began invoking my new handle_error macro (with a new DEBUG_STATEMENT code) to determine how far the code had gone before terminating. (This is more or less a binary search through the expected code path. Once I saw my DEBUG_STATEMENT message, I could start to narrow down which line triggered it.)

Once I did all of this, the code indicated a new error, SYNC_ERROR, ESR: 0x92000061. The docs indicated some sort of memory alignment issue causing a "Data Abort" exception. Using my debug infrastructure, I determined that my problem was using Clang/LLVM rather than GCC as my cross-compiler. I added a -mno-unaligned-access flag to Clang, and my code worked perfectly.

If you're using GCC, I doubt you're having the same issues I had, but the debug infrastructure should help you solve the problem. Sergey's code worked for me once I updated the linker script and ran his build process instead of Clang. It does indeed work on the 4B for me.

Hi @elgertam,

Thanks for your wonderful explanation. I can now successfully execute the code.

Here is the modification I made:

  • Use raspi-config provided by RPi-OS to downgrade my firmware to 634e380a4d041492f859712bd2c81112a535b515 .
  • Read the IRQ value from IRQ_BASIC_PENDING instead of IRQ_PENDING_1 .

From there, I still had issues, so I began debugging. I created a DEBUG macro, and I also began invoking my new handle_error macro (with a new DEBUG_STATEMENT code) to determine how far the code had gone before terminating. (This is more or less a binary search through the expected code path. Once I saw my DEBUG_STATEMENT message, I could start to narrow down which line triggered it.)

Once I did all of this, the code indicated a new error, SYNC_ERROR, ESR: 0x92000061. The docs indicated some sort of memory alignment issue causing a "Data Abort" exception. Using my debug infrastructure, I determined that my problem was using Clang/LLVM rather than GCC as my cross-compiler. I added a -mno-unaligned-access flag to Clang, and my code worked perfectly.

I am not sure whether this may would help you, but it did for me once I've integrated a printf which was assuming VFP was enabled when it was not, causing an exception to trigger.

I have added this line to my makefile to generate readable assembly text:
$(LLVMPATH)llvm-objdump -D kernel8.elf > kernel8.lst

Each line of kernel8.lst will show the memory address and the instruction it is executing. You can print the address at which the exception occurred (which you can get with something like this mrs x2, elr_el1), and you will be able to look up for it in kernel8.lst. This way you wouldn't need to binary search with debug tags manually ;)

void exc_unhandled(uint32_t type, uint64_t esr, uint64_t elr) {
  printf("Unhandled exception: %s (ESR=%X, ADDR=%X) - ", 
    exc_strings_type[type], esr, elr);
    ...
}