scylladb/seastar

Assertion failure when HTTP server asks to listen to a taken port, with multiple cores

nyh opened this issue · 2 comments

nyh commented

This bug was discovered in scylladb/scylladb#16720. The applications runs a Seastar HTTP server by calling something like:

    _http_server.listen(socket_address(addr, port));

On all cores.

The bug is what happens if we try to listen on an already-taken port.
If there is just one core, this listen() throws good exception:

std::system_error (error system:98, posix_listen failed for address 127.0.0.1:9042: Address already in use)

But if there are multiple cores, while core 0 throws this good exception, all other cores have an assertion failure:

scylla: seastar/src/net/posix-stack.cc:559: virtual future<accept_result> seastar::net::posix_ap_server_socket_impl::accept(): Assertion `i.second' failed.
Aborting on shard 1.

The backtrace:

__assert_fail at ??:?
seastar::net::posix_ap_server_socket_impl::accept() at ??:?
seastar::server_socket::accept() at ??:?
seastar::httpd::http_server::do_accept_one(int, bool) at ??:?
seastar::httpd::http_server::do_accepts(int, bool) at ??:?
seastar::httpd::http_server::listen(seastar::socket_address, seastar::listen_options, seastar::shared_ptr<seastar::tls::server_credentials>) at ??:?
seastar::httpd::http_server::listen(seastar::socket_address) at ??:?

The code in src/net/posix-stack.cc has a strange assert(i.second), with no explanation what it means, and this is the assert that fails.

This assert was introduced by @gleb-cloudius almost a decade ago, in commit 19addf6. The commit message explains that "only one thread opens listen socket and runs accept(). Other threads emulate listen()/accept() by waiting for connected socket from the main thread.". I'm not 100% sure that this commit message is actually up to date, because I was under the impression that today all cores accept() connections - and send them to other cores. But I'm not sure.

This also raises another question: Is it important that http_server::listen() be called first on core 0, and only then on other cores? Right now I call all of them in parallel, and luckily it does run on core 0 first, but what if it runs first on core 1, before core 0?

nyh commented

This problem is not specific to the HTTP server implementation. We had the same problem in ScyllaDB in other non-http servers, which like the HTTP server listen()ed and accept()ed connections in all cores in parallel. The bad port is detected as expected on shard 0, but other shards result in the assertion failure shown above, and cause the whole application to crash. The problem should probably be fixed in the context of listen() and/or accept(), not the HTTP server. I'm removing the HTTP server label from the issue.

The code in src/net/posix-stack.cc has a strange assert(i.second), with no explanation what it means, and this is the assert that fails.

The assertion means, that the code tried to emplace some new object to wait for connection, but emplace returned that that object was already there. See below for why the object was expected to be new and why it turned out that it wasn't.

This assert was introduced by @gleb-cloudius almost a decade ago, in commit 19addf6. The commit message explains that "only one thread opens listen socket and runs accept(). Other threads emulate listen()/accept() by waiting for connected socket from the main thread.". I'm not 100% sure that this commit message is actually up to date, because I was under the impression that today all cores accept() connections - and send them to other cores. But I'm not sure.

Currently only one core accepts sockets for real, other cores wait on sockets from that core via "ap-server" thing:

network_stack_entry register_posix_stack() {
    return network_stack_entry{
        "posix", std::make_unique<program_options::option_group>(nullptr, "Posix"),
        [](const program_options::option_group& ops) {
            return smp::main_thread() ? posix_network_stack::create(ops)
                                      : posix_ap_network_stack::create(ops);
        },
        true};
}   

the main_thread() returns true to only one shard's thread

This also raises another question: Is it important that http_server::listen() be called first on core 0, and only then on other cores? Right now I call all of them in parallel, and luckily it does run on core 0 first, but what if it runs first on core 1, before core 0?

This is how it works.

First, the httpd::listen() calls seastar::listen() and .accept() on the returned server_socket. On shard 0 this maps to posix_network_stack::listen() + posix_server_socket_impl::accept(). On non-0 shards this maps to posix_ap_network_stack::listen() + posix_ap_server_socket_impl::accept(). The ap-socket's accept code emplaces a "new object" mentioned above into sockets map to listen for connections coming from main thread. The sockets map is std::unordered_map<std::tuple<int, socket_address>, promise<accept_result>>, where int is the protocol (PF_... value). So it maps socket address to a promise which will be resolved into connected socket later.

Second, yet another httpd::listen() is called and it does the very same thing -- calls seastar::listen() and .accept() on the returned socket and both map to the same pair of calls as in first step. On non-main thread the .accept() call, which maps to posix_ap_server_socket_impl::accept(), tries to emplace one more "new object" mentioned above into sockets map to listen for connections coming from main thread, but the key it emplaces, is already there after the first https::listen()! It's already there, because socket_address part is the same for both -- first and second invocations. Thus the assertion.