betrusted-io/xous-core

Disconnect on `Drop` idiom is unsound

Opened this issue · 8 comments

Background

Every service in Xous currently uses an idiom similar to this one:

use core::sync::atomic::{AtomicU32, Ordering};
static REFCOUNT: AtomicU32 = AtomicU32::new(0);
impl Drop for Ticktimer {
fn drop(&mut self) {
// de-allocate myself. It's unsafe because we are responsible to make sure nobody else is using the
// connection.
if REFCOUNT.fetch_sub(1, Ordering::Relaxed) == 1 {
unsafe {
xous::disconnect(self.conn).unwrap();
}
}
}
}

The rationale behind this is as follows:

  • Connections are a scarce resource (31 per process) [1]
  • We would like to encourage connection re-use after an object is free'd
  • When an object is free'd, Drop get called
  • We stick a call to xous::disconnect() on the connection inside Drop

Seems simple in concept, except, because the connection IDs are scarce, they are shared between all threads within a process space. Thus, we have to count all the connections across all the threads, before we know we can safely disconnect.

The solution to this was to put a reference counter in the lib wrapper for every service. This reference is a static Atomic variable, and when we are the last one to decrement it, we know we are the last object standing and thus it is safe to disconnect().

Ah, but our troubles only begin here...

Reference Counting is Unsound

The reference count implementation is unsound. Although the count itself is atomic, the subsequent disconnect() call is not. Thus, we could get a reference count indicating we're the last object standing, and before we get to complete disconnect(), our quantum is yielded and a new thread starts up and allocates a connection. When our quantum is returned, we continue our disconnect() call, leading to the new thread having its connection severed to the server.

A possible solution to this is to add another AtomicBool that indicates that we're in a DISCONNECT-LOCKED state. If this is set by Drop before checking the reference count and cleared only after the disconnect is completed, and all allocators check this and wait until we are out of DISCONNECT-LOCKED, perhaps this eliminates the unsoundness in reference counting.

Except it gets worse.

std has Shadow Copies of Connections

std contains wrappers that connects to well-known servers to do work on behalf of the caller. If you do something with std::net, for example, it has to create a connection to net. This connection is outside the context of the net crate's lib API, so that connection is not included in its reference count. Thus, a user who uses both std::net and Xous-native net calls (for example, you're doing both TCP connections, and also checking wifi state) could have the legs swept out underneath the std::net APIs if the Xous-native net objects go out of scope before the std::net operations are finished. (Historical note: we didn't always have std, so in the original design of the system, the reference counting scheme could have been sound; once we implemented the std, we didn't notice this problem, until a code review just found it. In practice, I haven't seen this trigger a crash...but maybe it could explain some TLS instabilities in the folks working on sigchat!).

We probably haven't been bitten by this because generally, shared, "well-known" services such as the ticktimer, net (std::net), pddb (std::fs), rtc (datetime) are often bound at the top level of the main loop and the objects live the duration of the service. However, users who create and destroy threads that use any of these std calls could see crashes if these services aren't also bound inside the top of the main loop so that the reference count can actually go to zero when the threads exit.

Here are the problems with some obvious solutions:

  • Giving each connection a unique number, including ones that connect to the same server, would quickly exhaust the limited resource of connections. However, a unique number would make it safe to disconnect, since it would be a unique number.
  • Adding a large number of connections would force the kernel to allocate memory, adding complexity to the core design, something we would like to avoid.

A close look at the process structure shows that we do have space to add 14 more connections to the connection table. Thus, one potential solution looks something like this:

  1. Connections to std-related services never disconnect() on Drop. This should probably be done sooner than later -- in practice, most of those connections never do get to the point of calling disconnect(), so it will have minimal impact on current resource usage, but at least we can guarantee we don't step on std. Services that never use a particular std library call won't pay any penalty for this. Probably the biggest problem would come if you had a server that was tight on connections and only ever needed a std call once early on and could use the connection space. In which case we should...
  2. Implement an explicit disconnect() API for services that have the Drop modified. This allows users who are tight on connections and can guarantee they will never ever use a std API that relies on it to free up space in the connection table.
  3. Expand the connection table to round out the extra usable space, so as to mitigate the impact of the persistent connections to the std layer. 14 extra connections would still be a net improvement because the number of well-known servers that std relies upon is probably about 5 or 6 max.

Anyways, the impact of this problem seems to have been minimal, because this would lead to a very particular panic that would be reported on the screen (I forget the exact error message but it would probably say something about "Disconnected" or "Server not Found"). The correctness impact is large, but the practical impact so far has been small, likely because most important connections are bound in such a way that their lifetimes are typically eternal, but again, as we get more complicated (especially with the sigchat project) I suspect we'll run smack into this problem sooner or later.

References

[1] Connections are scarce because a goal (perhaps questionably?) is for the kernel to never allocate memory. The result is that we have a limited amount of memory to count connections (all state has to live in 4096 bytes).

This is the struct that describes a process, and it lives within the process itself at a fixed address:

/// Global parameters used by the operating system
pub inner: ProcessInner,

The connection map is here:

pub connection_map: [Option<NonZeroU8>; 32],

Also as a note, the Xous Book should be updated with a reference to this issue to explain why disconnect is hard and why it is unsafe.

xobs commented

An alternative is to make CID (or is it Connection?) an explicit type that doesn't implement Copy or Clone. That way you can't copy connection IDs. Then we wrap them around Arc<> and keep the drop implementation.

Then we get rid of the reuse of CIDs -- each call to connect() will return a unique CID.

For cases where you need to connect to services in a loop, you could have a pool of connections that get handed out as copies of the Arc<CID> in question.

That would at least remove some of the unsoundness behind disconnect(), but how often do programs rely on the behaviour of connect() returning an existing CID?

Oooh, that's a nice idea. Make CID not copy-able, and force us to use runtime checks like Arc. That's kind of exactly the idea behind not making everything Copy/Clone by default.

The problem is you still have a shadow copy of the CID existing in std, but maybe for servers that intersect with std we make them never disconnect on Drop because it's too unsafe to even try. And the std set is supposed to be small and enumerable -- is there even a location in the rust repo we can look that shows all the well-known-servers that std uses? Or are the server names sprinkled throughout the std implementation.

but how often do programs rely on the behaviour of connect() returning an existing CID?

I rely on this pretty often, especially with services that have threads that create/destroy handles to hardware objects frequently. I only rely on it to return an existing CID insofar as the thread copy of the connection does not burn another entry in the connection table, which, by definition means it is a copy of an existing CID. I don't actually care what the value is, just that it's not a different or new one, because connections are scarce.

A clear and simple example is the mount poller in pddb. If you want a service to wait for the PDDB to mount before starting, you crate a mount poller, which spawns a connection to the PDDB. If you didn't get a copy of the existing connection, that thread would eat another entry in the connection table, which becomes problematic in busy services that have lots of connections.

xobs commented

In the pddb example you could create a global Arc<Mutex<Option<CID>>>, or do something like this Playground link:

use std::sync::{
    atomic::{AtomicU8, Ordering},
    Arc, OnceLock,
};

/// A mock Connection ID. Note that in the real world, the caller
/// woudn't have access to the `u8` inside, and wouldn't be able
/// to call `clone()`.
pub struct CID(u8);

impl Drop for CID {
    fn drop(&mut self) {
        println!("Dropping connection {}", self.0);
    }
}

/// Keep track of the CID to ensure we don't duplicate it. If we
/// were getting a new connection each time, this number would
/// increment.
static CID_INDEX: AtomicU8 = AtomicU8::new(1);

/// Return the existing connection or create a new one.
pub fn connect() -> Arc<CID> {
    static OUR_CID: OnceLock<Arc<CID>> = OnceLock::new();
    OUR_CID
        .get_or_init(|| Arc::new(CID(CID_INDEX.fetch_add(1,
            Ordering::Relaxed))))
        .clone()
}

pub fn main() {
    // Demonstrate that explicit dropping works
    let dropped_cid = CID(0);
    drop(dropped_cid);

    // This should get dropped at the end of the program
    let _long_lived_cid = CID(100);

    // Use the long-lived connection. Since this is the first one,
    // it should have a CID of `1`.
    let cid = connect();
    println!("Got CID {}", cid.0);

    // Duplicate it. It should have the same CID as the first.
    let cid2 = connect();
    println!("On second connect got CID {}", cid2.0);
    
    // Dropping cid2 doesn't call the destructor
    drop(cid2);

    let mut thread_index = vec![];
    for i in 0..10 {
        thread_index.push(std::thread::spawn(move || {
            let c = connect();
            println!("Child thread {} got CID {}", i, c.0);
        }));
    }

    println!("Joining child threads...");
    for t in thread_index {
        t.join().unwrap();
    }
    
    // _long_lived_cid cannot be cloned or copied,
    // so this doesn't compile:
    //let other_cid = _long_lived_cid.clone();

    // Even if we drop `cid` here, the destructor doesn't
    // get run because it's still hanging out in the
    // `OnceLock<Arc<CID>>`
    drop(cid);

    // The _long_lived_cid should get dropped here as it
    // goes out of scope.
}
xobs commented

The magic there is entirely within connect() -- it will live for the length of the entire program and hand out Arc<CID> references to callers across thread boundaries.

In the above program, the CID is never incremented, even though it's "connected to" from multiple threads. The actual CID itself is never cloned, only a reference to it is cloned. And because the reference persists throughout the lifetime of the program, it is never dropped.

I'm missing how this integrates with xous::connect()...doesn't the OS give you the CID? In the example you give the CID is a local variable that you're incrementing. How do I use that as an argument to say, xous::send_message() to route a message to a server?

Or are you re-using the term CID to mean something else in this example, and the intention is I still allocate a xous::CID separately that gets handed out in addition to this CID that is used to track how many copies of the xous::CID there are?

I think the example makes more sense to me if it were like this:

pub fn connect() -> Arc<CID> {
    let xns = xous_api_names::XousNames::new().unwrap();
    static OUR_CID: OnceLock<Arc<CID>> = OnceLock::new();
    OUR_CID
        .get_or_init(|| Arc::new(CID(xns::request_connection_blocking(MY_SERVER_NAME).expect("can't connect to server")))
        .clone()
}

Or something like that maybe?

and maybe here it's meant to have this actually call xous::disconnect():


impl Drop for CID {
    fn drop(&mut self) {
        println!("Dropping connection {}", self.0);
    }
}

instead of just printing something. Maybe that's what you meant, I can't tell, and you just put placeholders in because you're trying to write something for godbolt but there's a lot of context missing with all the Xous API calls stripped out. Or maybe you're suggesting something entirely different and more generic in terms of how CID tracking should be done, like the callers are responsible for allocating and tracking CIDs now?

xobs commented

Here is a better example where I've created mocks of xous::connect() and xous::disconnect(): https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=5b4d8066dabf27a00a94a5a5dacb34da

I've also renamed the pool function to get_common_connection().

You could change xous::connect() to xns::request_connection_blocking() as you noted and it would behave the same.

The end result is that you have one static function that can be called from any thread that will give you the same long-lived connection ID. Furthermore, if you want to share connection IDs between threads you can do so by wrapping them in Arc<CID>. That way connections can be copied yet will not leak.

With this approach in place, we can change Xous so that subsequent calls to connect() will return a new CID.