ringbahn/iou

Possible bug using SubmissionQueueEvent::prep_timeout

mxxo opened this issue ยท 6 comments

mxxo commented

Trying to submit a timeout, I obtain an invalid argument error:

use std::io;                                                                                
use iou::sys;

#[test]
fn timeout_test() -> io::Result<()> {
    let mut io_uring = iou::IoUring::new(2)?;

    let mut sq = io_uring.sq();
    let mut sqe = sq.next_sqe().unwrap();
  
    // make a timeout
    let timeout_spec: _ = sys::__kernel_timespec {
        tv_sec:  0 as _,
        tv_nsec: 2e8 as _,
    };  
  
    unsafe { sqe.prep_timeout(&timeout_spec); }
  
    io_uring.sq().submit()?;
  
    // check that timeout reading works
    {   
        let mut cq = io_uring.cq();
        let cqe = cq.wait_for_cqe()?;
        assert!(cqe.is_timeout());
    }   
  
    Ok(())
}
running 1 test
Error: Os { code: 22, kind: InvalidInput, message: "Invalid argument" }
thread 'timeout_test' panicked at 'assertion failed: `(left == right)`

In iou, the timeout file descriptor is set to 0.
https://github.com/withoutboats/iou/blob/427273598e9ab7759a19b6301803eaba870dfc18/src/sqe.rs#L183

In liburing, it looks like the file descriptor is set to -1 based on:

static inline void io_uring_prep_timeout(struct io_uring_sqe *sqe,
					 struct __kernel_timespec *ts,
					 unsigned count, unsigned flags)
{
        io_uring_prep_rw(IORING_OP_TIMEOUT, sqe, -1, ts, 1, count);
//                                                ^ the fd arg

source

and

static inline void io_uring_prep_rw(int op, struct io_uring_sqe *sqe, int fd,
				    const void *addr, unsigned len,
				    off_t offset)

source

Changing the fd from 0 to -1 in SubmissionQueueEvent::prep_timeout, I then obtain a BADFILE error:

running 1 test
Error: Os { code: 9, kind: Other, message: "Bad file descriptor" }
thread 'timeout_test' panicked at 'assertion failed: `(left == right)`

Am I doing something silly here? Maybe there's something up with my kernel version (5.2.14)?

Thank you for your time.

Looks like this was changed in liburing upstream in 3c9d9580.

Interestingly I also see a bad file descriptor when running the liburing timeout test, but there is every possibility that I'm confused. ๐Ÿ˜ƒ

$ git log --oneline|head -n1
5943354 test/connect: fixup a few style issues
$ test/timeout
test_single_timeout: Timeout: Bad file descriptor
test_single_timeout failed

I saw when I updated the submodule in uring-sys that several of these prep APIs have started passing -1 for the fd, though I haven't investigated the impact of that. Someone should figure that out.

But my guess as to the most likely reason this is failing for y'all even after the change is that your kernel doesn't actually support timeouts! I'm certain the one I've been testing against (5.2.11) doesn't. This feature was added to the io_uring interface since September, I'm not sure what version number is the first to have timeouts but its extremely recent.

mxxo commented

I'm sorry for jumping the gun, timeout support looks like it was included in 5.4, which was released... five days ago ๐Ÿ˜….

For the fd as -1, it seems like a conceptual change (-1 more invalid than 0) but without any material difference.

3c9d9580 commit msg:

If an operation doesn't need a file, set fd to an invalid value.
This would help to spot an unwanted fdget() in the kernel, and is
conceptually more correct (though negotiable)

My impression is keeping the fd's 0 shouldn't break timeouts, nops, etc. (when they're supported).

Cool!

As to the fds, if we're currently setting any of them to 0 we should stop. Either we set them to a junk value (-1) like liburing does, or we just don't set them. Setting them to 0 does absolutely nothing for us, since 0 would be a valid file descriptor if these ops took a file.

While the commit message in liburing sounds like its more concerned with the developers testing the io_uring interface, my main concern would be having the most compatible outcome with old kernels that don't recognize the operation you're using. To that end we wouldn't want it to accidentally perform some other operation on a valid fd the program is using.

It's sort of interesting that you got EBADFILE when you set the fd to -1 instead of still getting EINVAL.. Suggests the kernel version you're using is checking the fd before checking if the fd is even needed for this op.

I think we should set this to -1 like liburing does, to be most safe on systems that don't support the op.

Another more holistic option is to pursue ringbahn/uring-sys#4 and expose all the prep_* functions in uring-sys, then just call those from iou instead of redefining them in Rust and having to worry about these nuances!