olofk/serv

mbedTLS on SERV

Closed this issue · 5 comments

Hi,

I am currently trying to evaluate the performance of doing some crypto on SERV, but I'm running into some issues with Zephyr on SERV.

I wrote the attached example (main.c) that just initializes the RSA library according to their example, and then prints a string in a loop (no actual en-/decryption happens yet).

Yet, when I run the example (both on real HW and in simulation through fusesoc+verilator), the entire system hangs indefinitely (real HW) or resets (sim). If you remove the mbedtls init calls, the loop runs fine.

I have been trying to troubleshoot this a bit myself already, but I was hoping you might have some more experience analyzing these issues.

I attached the following files: rsa-test-zephyr-serv.zip

  • main.c: My test program (don't forget to enable the mbedtls RSA + entropy configs)
  • log.txt: Output from verilator
  • rsa-test-zephyr-serv.hex: The hex file I tested with
  • run.sh: My run command for the simulation
  • rsa-test-zephyr-serv.tar.xz: the entire application directory for zephyr (including configs and build results that produced the described output)

Let me know if you have any ideas on how to approach this / if you find out anything. Thank you for your help!

PS: I have not yet tried running the example program on any other RISC-V target. That's still on my TODO list.

I haven't found time to look at this issue, but I did just fix an issue that made the regression tests fail. Probably unrelated, but could be worth trying with the latest git version.

Also, during my debugging sessions, I have develeoped some rudimentary tools that can make life slightly easier.

On the trace branch, I have added a WIP tracing module that shows register values and what opcode that was just executed. This makes things slightly easier to follow when doing waveform debugging.

You can also switch on the --trace_pc switch for simulations. This will generate a trace.bin that contains the PC trace. Together with this questionable python hack you can do more analysis of what the CPU is doing. By default it will generate a trace with annotations of all jumps if you run it with python trays.py /path/to/elf/file /path/to/trace.bin

Another thing I typically do is to take the assembler listing of the program you want to run and check for illegal instructions or attempted accesses to unimplemented CSR registers. To save size, SERV has zero protection from these things and will happily try to execute instructions that it doesn't have the slightest clue about how to actually run. Looking quickly at the file you sent me, there is a read from mhartid. That might be totally fine, but worth taking an extra look at. I also see a wfi instruction + a couple of zext.b (which a quick googling tells me is the same as andi??) that could be wort investigating.

I didn't have much time to work on this, but I at least looked at your suggestions.

1. mhartid and wfi

There is only one access to mhartid, and it's conveniently in the same block as the wfi: The bootup section for unused cores (which we should never hit):
Screenshot from 2024-09-06 11-25-11

I'm pretty sure everything here is fine as we only have one CPU core, and even if the read from mhartid returned a core other than core 0, I'd expect a more severe error than a crash late in the program.

2. zext.b

I looked at the disassembly. zext.b produces the same assembly code as an andi with immediate 255.

3. Your latest updates

I pulled the latest main branch today, and it sadly does not fix the issue.

.

I hope I'll have some more time to look into this in the upcoming days, but no promises :/

I finally had some more time to work on this, and I think I can give you (and SERV) the all-clear here as I believe this is a problem with Zephyr/mBEDTLS.

I instantiated a neorv32 core in place of SERV in my design and ran into exactly the same issue.

Also, I completed the decryption code and, lo and behold, it actually successfully decrypts a file -- and then immediately crashes with an invalid memory access (but again, both on neorv32 and SERV).

Here's the output just for reference (and yes, my system is reaaaaallly slow):

{2024-11-08 14:07:20.189759} Init memory...
{2024-11-08 14:07:20.205713} Init mbedtls...
{2024-11-08 14:07:20.221506} Read private key...
{2024-11-08 14:07:20.381590} Import keys...
{2024-11-08 14:07:20.413669} Complete RSA init...
{2024-11-08 14:07:27.832654} Checking data...
{2024-11-08 14:07:27.833128} Decrypting...
{2024-11-08 14:29:25.507402} 
{2024-11-08 14:29:25.507525}   . OK
{2024-11-08 14:29:25.507770} 
{2024-11-08 14:29:25.507829} The decrypted result is: 'Hello World 1.Hello World 2.Hello World 3.Hello World 4.Hello World 5.Hello World 6.Hello World 7.'
{2024-11-08 14:29:25.509669} 
{2024-11-08 14:29:25.509709} RSA part done. Entering mainloop.
{2024-11-08 14:29:25.509968} Hello World! trashernet_soc 98
{2024-11-08 14:29:25.571329} *** Booting Zephyr OS build v3.7.0 ***
{2024-11-08 14:29:25.603074} [00:00:00.000,000] <err> os: 
{2024-11-08 14:29:25.619525} [00:00:00.000,000] <err> os:  mcause: 4, Load address misaligned
{2024-11-08 14:29:25.667271} [00:00:00.000,000] <err> os:   mtval: af45db15
{2024-11-08 14:29:25.699426} [00:00:00.000,000] <err> os:      a0: 40010868    t0: 00000009
{2024-11-08 14:29:25.747007} [00:00:00.000,000] <err> os:      a1: 00000000    t1: 00000008
{2024-11-08 14:29:25.779210} [00:00:00.000,000] <err> os:      a2: 000004e2    t2: 00000000
{2024-11-08 14:29:25.827006} [00:00:00.000,000] <err> os:      a3: 00000000    t3: 00000010
{2024-11-08 14:29:25.859180} [00:00:00.000,000] <err> os:      a4: af45db05    t4: 00000006
{2024-11-08 14:29:25.906909} [00:00:00.000,000] <err> os:      a5: 40013ab4    t5: 00000000
{2024-11-08 14:29:25.939393} [00:00:00.000,000] <err> os:      a6: 00000000    t6: 00000010
{2024-11-08 14:29:25.986918} [00:00:00.000,000] <err> os:      a7: 0000002d
{2024-11-08 14:29:26.018877} [00:00:00.000,000] <err> os:      sp: 40012fb0
{2024-11-08 14:29:26.051104} [00:00:00.000,000] <err> os:      ra: 4000dd30
{2024-11-08 14:29:26.082997} [00:00:00.000,000] <err> os:    mepc: 4000dab0
{2024-11-08 14:29:26.115157} [00:00:00.000,000] <err> os: mstatus: 00000000
{2024-11-08 14:29:26.147255} [00:00:00.000,000] <err> os: 
{2024-11-08 14:29:26.178890} [00:00:00.000,000] <err> os: call trace:
{2024-11-08 14:29:26.210827} [00:00:00.000,000] <err> os:       0: ra: 4000dab0
{2024-11-08 14:29:26.242990} [00:00:00.000,000] <err> os:       1: ra: 4000d99c
{2024-11-08 14:29:26.274772} [00:00:00.000,000] <err> os:       2: ra: 4000ccd0
{2024-11-08 14:29:26.323032} [00:00:00.000,000] <err> os:       3: ra: 400051c8
{2024-11-08 14:29:26.354824} [00:00:00.000,000] <err> os:       4: ra: 4000d78c
{2024-11-08 14:29:26.386622} [00:00:00.000,000] <err> os:       5: ra: 4000d844
{2024-11-08 14:29:26.418884} [00:00:00.000,000] <err> os:       6: ra: 40000ea8
{2024-11-08 14:29:26.451341} [00:00:00.000,000] <err> os: 
{2024-11-08 14:29:26.483046} [00:00:00.000,000] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 18
{2024-11-08 14:29:26.530826} [00:00:00.000,000] <err> os: Current thread: 0x58a2f65 (unknown)
{2024-11-08 14:29:26.578357} [00:00:00.000,000] <err> os: Halting system

The interesting thing about the crash that it's in Zephyr's remove_timeout() function, called through various levels from my k_msleep(1000) call in the mainloop. And even weirder, the memory address (from register a4) is not just misaligned (as the error suggests), it's outright garbage (0xaf45db05 is outside of valid memory address space for my system). Looks like some kind of software-bug caused memory corruption. (Note: Interrupts were disabled for this test.)

Either way, since neorv32 behaves the exact same way, I'm pretty sure this is not a bug in SERV, so we can close this issue.

Thank you for your help!

Markus

m42uko commented

For anyone wondering, and for me to confirm that this is not a problem with SERV, here's what the problem was: I ran out of stack memory in the main thread and overwrote critical sections of OS memory. Doh!

The solution is to either increase the stack memory for the main/init process in the .config, or to create a second thread with more stack memory. Here's my working example utilizing the second method: decrypt.zip Takes about 25 minutes to decrypt 128 bytes (@ 25 MHz core clock), but it works :D

olofk commented

Very cool. Thanks for sharing.

And that exact problem of running out of stack space has happened to me as well. It's an absolute pain to debug and looks really similar to a hardware bug. Would be nice to have some better way to detect this problem, at least in simulation.