rust-embedded/riscv-rt

Code doesn't run with full `main` signature.

bradjc opened this issue · 4 comments

I am testing the riscv-crates repository with effectively rust-lang/rust#52787 (I compiled my own version, but the changes should be the same at the end of the day) and found that I could not get the examples to actually run on the HiFive1 board. After some debugging, I tracked it down to:

https://github.com/riscv-rust/riscv-rt/blob/eb45c67890f862037c7475329e4fbc5683fe9e03/src/lib.rs#L183

and

https://github.com/riscv-rust/riscv-rt/blob/eb45c67890f862037c7475329e4fbc5683fe9e03/src/lib.rs#L259-L261

After changing it to:

fn main() -> isize;

and

unsafe {
    main();
}

the blink_delay and hello_world apps started working.

Cool! I'll try it out once rust-lang/rust#52787 lands. In the mean time you can submit a PR if you like, I'll also start opening PR's instead of pushing to master. It would help reviewing this PR if you find the commit to rust-lang making this change necessary and add it to the commit message.

I don't actually know what is making this change required. It was made in the cortex-m-rt crate as well: rust-embedded/cortex-m-rt@8dd26b2#diff-b4aea3e418ccdb71239b96952d9cddb6R35

I was curious what was causing the previous version to break so I dug into this a bit more...

I am just building a random sample problem, and I see that without this patch it does indeed seem to enter the trap handler as soon as it starts up. I hacked ebreak into the very top of the _start_trap routine so that I could use gdb to see where exactly the trap was coming from. I figured out it was mcause 7 (store fault) at this address in _start_rust:

Dump of assembler code for function _start_rust:
   0x20400020 <+0>:	lui	a0,0x80000
   0x20400024 <+4>:	mv	a0,a0
   0x20400028 <+8>:	lui	a1,0x80000
   0x2040002c <+12>:	addi	a1,a1,4 # 0x80000004
   0x20400030 <+16>:	auipc	ra,0xb
   0x20400034 <+20>:	jalr	1020(ra) # 0x2040b42c <r0::zero_bss>
   0x20400038 <+24>:	j	0x2040003c <_start_rust+28>
   0x2040003c <+28>:	lui	a0,0x80000
   0x20400040 <+32>:	addi	a0,a0,4 # 0x80000004
   0x20400044 <+36>:	lui	a1,0x80000
   0x20400048 <+40>:	addi	a1,a1,4 # 0x80000004
   0x2040004c <+44>:	lui	a2,0x20450
   0x20400050 <+48>:	addi	a2,a2,584 # 0x20450248
   0x20400054 <+52>:	auipc	ra,0xb
   0x20400058 <+56>:	jalr	1120(ra) # 0x2040b4b4 <r0::init_data>
   0x2040005c <+60>:	j	0x20400060 <_start_rust+64>
   0x20400060 <+64>:	lui	a0,0x20400
   0x20400064 <+68>:	addi	a0,a0,176 # 0x204000b0 <_start_trap>
   0x20400068 <+72>:	csrw	mtvec,a0
   0x2040006c <+76>:	auipc	ra,0xb
   0x20400070 <+80>:	jalr	816(ra) # 0x2040b39c <core::ptr::null>
=> 0x20400074 <+84>:	sw	a0,12(sp)
   0x20400078 <+88>:	j	0x2040007c <_start_rust+92>
   0x2040007c <+92>:	li	a0,0
   0x20400080 <+96>:	lw	a1,12(sp)
   0x20400084 <+100>:	auipc	ra,0x1
   0x20400088 <+104>:	jalr	1300(ra) # 0x20401598 <main>
   0x2040008c <+108>:	sw	a0,8(sp)
   0x20400090 <+112>:	j	0x20400094 <_start_rust+116>
   0x20400094 <+116>:	j	0x20400098 <_start_rust+120>
   0x20400098 <+120>:	auipc	ra,0xb
   0x2040009c <+124>:	jalr	884(ra) # 0x2040b40c <riscv::asm::wfi>
   0x204000a0 <+128>:	j	0x20400098 <_start_rust+120>
End of assembler dump.

I'm still getting the hang of RISCV assembly but it seems like that instruction is storing the return value of core::ptr::null() (which is of course just 0, a bit of a waste really) into a stack local at $sp + 12. And then right afterwards it loads it back into a1 to pass to main().

But at this point the stack pointer is still set to its initial value _stack_start = 0x80004000 which is one past the last address of RAM.

That seems to be because we are declaring the start_rust function as #[naked] which means it does not create a stack frame for itself. That is probably the real bug, since it's a Rust function so we can't really assume the compiler won't try to use stack locals.

Also I think the fix might not be quite correct either.

In cortex-m-rt they are actually not using the "real" main at all, there is a function main() -> ! but it is kind of an implementation detail. The program is supposed to use #[no_main] and declare its entry point using a macro entry!, it's mentioned in this comment https://github.com/rust-embedded/cortex-m-rt/blob/6c269f195c9a705162fd912beb952b430cc28336/src/lib.rs#L89 where it shows an example program:

//! // IMPORTANT the standard `main` interface is not used because it requires nightly
//! #![no_main]
//! #![no_std]
//!
//! #[macro_use(entry, exception)]
//! extern crate cortex_m_rt as rt;
//!
//! // makes `panic!` print messages to the host stderr using semihosting
//! extern crate panic_semihosting;
//!
//! use rt::ExceptionFrame;
//!
//! // use `main` as the entry point of this application
//! entry!(main);

We should probably follow the same pattern.

What we have right now though, is assuming the program will declare a "real" main() function which rustc will force to have the UNIX convention of either two args returning int or nullary returning int. But if the program uses the two arg form, now we are just calling that with uninitialized registers, right?

Ultimately it doesn't make much difference because the program could never expect anything useful to be in argc or argv anyway. But the #[no_main] approach used in cortex-m-rt makes it clearer I think.

I will send some PRs later when I get more time :-)