smol-rs/smol

Failing to check for closed socket causes future to spin, hogging executor thread

bradleyharden opened this issue · 2 comments

Hi,

While switching from Tokio to smol, I encountered an issue. My integration tests still worked, but only when run one at a time. When running multiple tests, some of them would hang indefinitely.

I traced the issue to a spawned future that was spinning, hogging the global executor thread. I'm not sure if this is strictly a problem with my code, or if the behavior of smol is non-ideal in this case.

Here is a minimal reproduction of the issue:

use std::thread::sleep;
use std::time::Duration;

use smol::io::{AsyncReadExt, AsyncWriteExt};
use smol::net::unix::UnixStream;

fn main() {
    let (a, b) = UnixStream::pair().unwrap();
    let (c, d) = UnixStream::pair().unwrap();
    spawn_connections(b, c);

    let (w, x) = UnixStream::pair().unwrap();
    let (y, z) = UnixStream::pair().unwrap();
    spawn_connections(x, y);

    std::thread::scope(|scope| {
        scope.spawn(|| {
            smol::block_on(test_echo(a, d));
            println!("Done with a/d");
        });
        scope.spawn(|| {
            sleep(Duration::from_secs(1));
            smol::block_on(test_echo(w, z));
            println!("Done with w/z");
        });
    });
}

async fn test_echo(mut f: UnixStream, mut g: UnixStream) {
    let sent = b"hello, world";
    let mut received = [0; 12];
    f.write_all(sent).await.unwrap();
    g.read_exact(&mut received).await.unwrap();
    assert_eq!(sent, &received);
}

fn spawn_connections(f: UnixStream, g: UnixStream) {
    let (mut f_read, mut f_write) = smol::io::split(f);
    let (mut g_read, mut g_write) = smol::io::split(g);
    smol::spawn(async move {
        let mut buf = vec![0; 1024];
        loop {
            let n = f_read.read(&mut buf).await.unwrap();
            // FIXME: Must test for 0 or this future will spin
            //if n == 0 {
            //    break;
            //}
            g_write.write_all(&buf[..n]).await.unwrap();
        }
    })
    .detach();
    smol::spawn(async move {
        let mut buf = vec![0; 1024];
        loop {
            let n = g_read.read(&mut buf).await.unwrap();
            // FIXME: Must test for 0 or this future will spin
            //if n == 0 {
            //    break;
            //}
            f_write.write_all(&buf[..n]).await.unwrap();
        }
    })
    .detach();
}

If you run this with smol 1.3.0 and cargo run, you will see that only Done a/d completes. There are two ways to fix the problem, either increase SMOL_THREADS to at least 3, or add the commented lines referenced by the FIXME notes.

The problem seems to be that the spawned futures are polled even after the UnixSockets are closed, so the calls to read return zero. Since I don't check for zero and exit, the future begins to spin.

This is not a problem for Tokio. To me, that suggests that it is either wrong or at least non-ideal to continue polling the futures after the socket is dropped. But maybe this is just a different design decision? I'm not sure.

I think that your two futures in spawn_connections could be implemented using smol::io::copy, which properly handles checking for zero-sized reads indicating EOF.

One of the main architectural differences between Tokio and Smol is that Tokio's reactor uses edge-triggered polling while Smol's uses oneshot polling. The intuitive way to think about this is:

  • Tokio checks for incoming data on the socket and then reads from it.
  • Smol tries to read/write/whatever from the socket, and if there is no data, waits for new data.

In most cases the user facing differences between these two approaches is not noticeable. However, since there is no more incoming data Tokio returns Poll::Pending when trying to read from a closed socket. On the other hand Smol returns Poll::Ready(0).

I think that your two futures in spawn_connections could be implemented using smol::io::copy, which properly handles checking for zero-sized reads indicating EOF.

This was just a minimal example. In my actual code, I perform some modifications to the stream, so copy wouldn't be a viable option.

One of the main architectural differences between Tokio and Smol is that Tokio's reactor uses edge-triggered polling while Smol's uses oneshot polling.

Yes, I was at least tangentially aware of this. I encountered the problem above in integration tests using tokio/smol, but my actual code uses mio directly. And since mio doesn't have configurable polling, I had to adapt my code for edge-triggering. I only later discovered polling. I wish I would have seen it earlier, since level-triggering would have been more appropriate for my situation.

However, since there is no more incoming data Tokio returns Poll::Pending when trying to read from a closed socket. On the other hand Smol returns Poll::Ready(0).

I thought this might be a conscious design decision, but I wasn't sure. Thanks for clarifying. I'll be sure to keep that difference in mind in the future.

Thanks for answering my question.