armanbilge/epollcat

`EpollAsyncSocketChannel#read` can complete with 0 bytes

Opened this issue · 2 comments

I'm observing this in FS2, which uses this method:

def read[A](
dst: ByteBuffer,
timeout: Long,
unit: TimeUnit,
attachment: A,
handler: CompletionHandler[Integer, _ >: A]
): Unit =

It seems like it will sometimes complete the handler indicating it has read 0 bytes.

I'm not sure if this is a bug per se, but it may be evidence of some kind of inefficiency, since the only way to get this result is to get EWOULDBLOCK when calling posix.unistd.read for the first (and only) time.

The only "innocent" explanation I can think of is that epoll_wait is notifying us there is data ready when actually there isn't. If that's not the case, then it's our mistake, we are somehow thinking a read is ready when it's not.

I do not know about epoll but poll and select can complete when they are interrupted by timers, etc.
I think there are some argument bits one must check to see if it completed because of a read, or because
of some interrupt, say timer. Not to direct you off into a rathole.

Spurious wake-ups are annoying (or beyond).

I am at End-of-Sprint and can't look now but will try to this weekend.

Yes, you are right about that :) in this case it's not the fact that epoll_wait is completing that's unexpected.

When epoll_wait completes, we loop through the events and invoke the callbacks.

val triggeredEvents = epoll_wait(epfd, events, maxEvents, timeoutMillis)
if (triggeredEvents >= 0) {
var i = 0
while (i < triggeredEvents) {
val event = events + i.toLong
val cb = fromPtr[Int => Unit](event.data)
try {
cb(event.events.toInt)

Then, in the callback itself, we check which specific events we were notified about.

private def callback(events: Int): Unit = {
if ((events & EpollExecutorScheduler.Read) != 0) {
readReady = true
if (readCallback != null) readCallback.run()
}
if ((events & EpollExecutorScheduler.Write) != 0) {
writeReady = true
if (writeCallback != null) writeCallback.run()
}
}

So it's not only that epoll_wait would have completed, but it also invoked the callback for this particular socket, and that callback was notifying about a READ event. So that's a lot of coincidences 😛


There is one more innocent explanation, which is: in our last read, we read exactly the number of remaining bytes on the socket, without invoking read one more time to get EAGAIN. So this would be left as a "trap" for the very next invocation :)