netty/netty

Selector.select() returned prematurely 512 times in a row; rebuilding selector while run testsuite

Closed this issue · 8 comments

The following is logged on my crunchbang linux installation:

21:30:14.072 [Thread-1] INFO  i.n.t.t.socket.SocketStartTlsTest - Running: testStartTls[0: serverEngine = io.netty.handler.ssl.JdkSslServerContext@50aeaf68, clientEngine = io.netty.handler.ssl.JdkSslClientContext@33bd9009] 3 of 3 (ServerBootstrap(group: OioEventLoopGroup, channelFactory: OioServerSocketChannel.class, localAddress: /0:0:0:0:0:0:0:1%1:33570, options: {SO_TIMEOUT=10, ALLOCATOR=UnpooledByteBufAllocator(directByDefault: true)}, childGroup: OioEventLoopGroup, childOptions: {ALLOCATOR=UnpooledByteBufAllocator(directByDefault: true)}) + Bootstrap(group: NioEventLoopGroup, channelFactory: NioSocketChannel.class, options: {ALLOCATOR=UnpooledByteBufAllocator(directByDefault: true)}, remoteAddress: /0:0:0:0:0:0:0:1%1:33570)) with UnpooledByteBufAllocator
21:30:14.075 [testsuite-nio-worker-4-3] WARN  io.netty.channel.nio.NioEventLoop - Selector.select() returned prematurely 512 times in a row; rebuilding selector.
21:30:14.075 [testsuite-nio-worker-4-3] INFO  io.netty.channel.nio.NioEventLoop - Migrated 1 channel(s) to the new Selector.
21:30:14.075 [testsuite-nio-worker-4-3] WARN  io.netty.channel.nio.NioEventLoop - Selector.select() returned prematurely 512 times in a row; rebuilding selector.
21:30:14.075 [testsuite-nio-worker-4-3] INFO  io.netty.channel.nio.NioEventLoop - Migrated 1 channel(s) to the new Selector.
21:30:14.076 [testsuite-nio-worker-4-3] WARN  io.netty.channel.nio.NioEventLoop - Selector.select() returned prematurely 512 times in a row; rebuilding selector.
➜  testsuite git:(4.0) ✗ java -version
java version "1.7.0_55"
OpenJDK Runtime Environment (IcedTea 2.4.7) (7u55-2.4.7-2)
OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)
➜  testsuite git:(4.0) ✗ uname -a
Linux dev05 3.14-1-amd64 #1 SMP Debian 3.14.7-1 (2014-06-16) x86_64 GNU/Linux

This is reproducible all the time

After more debugging this only happens on linux kernel 3.14. After downgrade to kernel 3.10.45 all works like expected.

The problem can be reproduced with pure java.nio:

import java.io.IOException;
import java.nio.channels.SelectionKey;
import java.nio.channels.Selector;
import java.nio.channels.SocketChannel;
import java.util.Iterator;
import java.util.concurrent.TimeUnit;

public class SelectorBug {

    public static void main(String[] args) throws Exception {

        SelectorLoop selectThread = new SelectorLoop();
        selectThread.start();
        SocketChannel channel = SocketChannel.open();
        channel.configureBlocking(false);
        SelectionKey key = channel.register(selectThread.selector, 0, null);
        selectThread.selector.wakeup();
        Thread.sleep(10000);
        selectThread.shutdown();
        key.cancel();
    }

    private static final class SelectorLoop extends Thread {
        private volatile boolean shutdown;
        private final Selector selector;
        SelectorLoop() throws IOException {
            selector = Selector.open();
            setDaemon(true);
        }
        @Override
        public void run() {
            while (!shutdown) {
                try {
                    long before = System.nanoTime();
                    int selected = selector.select(1000);
                    System.out.println("Selected=" + selected + ' ' + TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - before));
                    if (selected > 0) {
                        Iterator<SelectionKey> keys = selector.selectedKeys().iterator();
                        while (keys.hasNext()) {
                            keys.next();
                            keys.remove();
                        }
                    }
                } catch (IOException e) {
                    e.printStackTrace();
                    break;
                }
            }
            try {
                selector.close();
            } catch (IOException e) {
                // ignore
            }
        }

        public void shutdown() {
            shutdown = true;
        }
    }
}

Normally you should not see more then about 11 print statements. With kernel 3.14 I see thousands.

Linux Kernel 3.15:

trustin@infinity:/home/trustin/Sandbox
$ uname -r
3.15.2-1-ARCH
trustin@infinity:/home/trustin/Sandbox
$ java -cp . SelectorBug
Selected=0 1001
Selected=0 0
Selected=0 1001
Selected=0 1001
Selected=0 1001
Selected=0 1001
Selected=0 1001
Selected=0 1001
Selected=0 1001
Selected=0 1001
Selected=0 1001
trustin@infinity:/home/trustin/Sandbox
$ 

Also fixed in latest 3.14.10 kernel:

➜  src  java SelectorBug      
Selected=0 1
Selected=0 1001
Selected=0 1001
Selected=0 1002
Selected=0 1001
Selected=0 1002
Selected=0 1002
Selected=0 1001
Selected=0 1001
Selected=0 1001
➜  src  uname -r
3.14.10

I've the same problem with linux kernel 4.2 and 3.16. (Debian Jessie)

Also seeing this with a custom kernel 4.13.3 on Gentoo. Any idea if this is a kernel configuration issue, not a kernel bug?

@timboudreau sorry no idea :(

We're seeing this as well on our Play! app.

Netty 4.0.51.Final

java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

uname -r : 4.9.20-11.31.amzn1.x86_64

Happens under heavy load.

Any help would be appreciated.

Tagar commented

Happens on RHEL 7.4 too (3.10.0-693.2.2.el7.x86_64) on heavy load.

Curious if these two tunables can workaround this issue?

$ sudo sysctl -a | grep poll
fs.epoll.max_user_watches = 108141834
net.core.busy_poll = 0