betrusted-io/xous-core

Guru Mediation on listen response

Opened this issue · 7 comments

bunnie commented

There is a relatively likely Guru Mediation on Precursor hardware that manifests about the time that a sync response is received from the Matrix server.

thread '<unamed>' panicked at 'unsafe precondition(s) voilated: nonNull::new_unchecked requires that the pointer is non-null', library/core/src/panicking.rs:123:5 thread caused non-unwinding panic. aborting

This is never seen in hosted mode - and just possibly has something to do with smoltcp

bunnie commented

I'm hacking on the smoltcp stack now, I'll try to reproduce this as part of the adventures.

bunnie commented

This has, unfortunately, re-appeared despite the refactored smoltcp stack. So it's not that...

bunnie commented

Unfortunately the panic message is in std, and not linked to code within the xous code base specifically. This means probably one of two things happened -- a double-panic, or an OOM.

Double panic could be a locking issue -- looks like NonNull is used in rwlock primitives during its panic message to print some info, and if the lock is not poisoned correctly you'll get something about a NonNull error.

The other option is the allocator simply couldn't return data, and Rust does not accept that as correct behavior and will respond with a panic, probably quite similar to this. I'm increasing the heap size and seeing if that makes any difference...

xobs commented

I have an unwinding version in the works. This may be useful. Unfortunately, while it gives you a stack trace, it does not give you symbol names. So you'd get a list of addresses that you'd manually correlate with the ELF file.

Unfortunately, enabling unwind support will increase the file size. Would you be interested in trying it out? I can get you a build that would mostly work in about two weeks.

bunnie commented

Yes, would be very interested in that. I don't mind manually correlating through the ELF file to trace this one out -- it's going to be faster than trying to trigger the bug and infer what's happening. As far as I can tell it only happens when things are "really busy", I don't have a good synthetic test bench to trigger it and it doesn't trigger reliably.

Re: file size, we can make unwinding support a flag (or you can just make a PR and I can see what the diff is and create a feature flag for it; if it's a different std that's also fine, I can do a local-only build of std with unwinding -- would only need to use it for really hard cases like this).

bunnie commented

Might also be worth trying to trigger this is Renode, but I still haven't figured out how to get full-on networking to work in Renode. I'm able to get to the point where I can get an IP address, but the thing that's emulating the network (I think it's a tiny linux instance?) doesn't respond to ARPs and doesn't NAT packets to the world.

xobs commented

It doesn't NAT packets, yeah. Getting it online is still somewhat manual:

  1. Create an interface with Renode. If you run xous-release-tap.resc then this will create an actual interface you see on your system called renodetap that is almost exactly the Ethernet frames that get sent to the wifi chip.
  2. Get renodetap online. This can be done with "Share My Internet Connection" which I believe NetworkManager can do. Or you can bridge it. Or you can manually use iptables to set it up.

You might try looking at https://help.ubuntu.com/community/Internet/ConnectionSharing for information, assuming you're using Ubuntu