co-uring-http
is a high-performance HTTP server built on C++20 coroutines and io_uring
. This project serves as an exploration of the latest features of Linux kernel and is not recommended for production use. For learning purposes, co-uring-http
re-implements general-purpose coroutine primitives (such as task<T>
, sync_wait<task<T>>
, etc.) instead of utilizing existing libraries.
- Leverages C++20 coroutines to manage clients and handle HTTP requests, which simplifies the mental overhead of writing asynchronous code.
- Leverages
io_uring
for handling async I/O operations, such asaccept()
,send()
,recv()
, andsplice()
, reducing the number of system calls. - Leverages ring-mapped buffers to minimize buffer allocation costs and reduce data transfer between user and kernel space.
- Leverages multishot accept in
io_uring
to decrease the overhead of issuingaccept()
requests. - Implements a thread pool to utilize all logical processors for optimal hardware parallelism.
- Manages the lifetime of
io_uring
, file descriptors, and the thread pool using RAII classes.
task
(task.hpp
): Thetask
class represents a coroutine that does not start until it is awaited.thread_pool
(thread_pool.hpp
): Thethread_pool
class provides an abstraction that allows a coroutine to be scheduled on fixed-size pool of threads. The number of threads is limited to the number of logical processors, allowing for hardware parallelism.
file_descriptor
(file_descriptor.hpp
): Thefile_descriptor
class owns a file descriptor. Thefile_descriptor.hpp
file provides general-purpose functions that works with thefile_descriptor
class, such asopen()
,pipe()
, andsplice()
.server_socket
(socket.hpp
): Theserver_socket
class extends thefile_descriptor
class and represents the listening socket that could accept clients. It provides anaccept()
method, which records if there is an existing multishotaccept
request inio_uring
and submits a new one if none exists.client_socket
(socket.hpp
): Theclient_socket
class extends thefile_descriptor
class and represents the socket that could communicate with the client. It provides asend()
method, which submits asend
request toio_uring
and arecv()
method, which submits arecv
request toio_uring
.
io_uring
(io_uring.hpp
): Theio_uring
class is athread_local
singleton, which owns the submission queue and completion queue ofio_uring
.buffer_ring
(buffer_ring.hpp
): Thebuffer_ring
class is athread_local
singleton, manages a collection of buffers that are supplied toio_uring
. When an incoming HTTP request arrives,io_uring
selects a buffer from the available pool and populates it with the received data, eliminating the need for allocating a new buffer for each request. Once thehttp_parser
finishes parsing the request, the server returns the buffer toio_uring
, enabling its reuse for future requests. The number of buffers and and the size of each buffer are defined inconstant.hpp
, which can be adjusted based on the estimated workload of the HTTP server.
http_server
(http_server.hpp
): Thehttp_server
class creates athread_worker
task for each thread in thethread_pool
and waits for these tasks to finish. The tasks will finish when an exception occurs.thread_worker
(http_server.hpp
): Thethread_worker
class contains a collection of coroutines that will be scheduled on each thread. When the class is constructed, it spawnsthread_worker::accept_client()
andthread_worker::event_loop()
.- The
thread_worker::event_loop()
coroutine processes events in the completion queue ofio_uring
and resumes the coroutine that is waiting for the completion of that event. - The
thread_worker::accept_client()
coroutine invokesserver_socket::accept()
to submit a multishotaccept()
request toio_uring
, which will generate a completion queue event for each incoming client. When a client arrives, it spawns thethread_worker::handle_client()
coroutine. - The
thread_worker::handle_client()
coroutine invokesclient_socket::recv()
to wait for an HTTP request. When a request arrives, it parses the request withhttp_parser
(http_parser.hpp
), constructs anhttp_response
(http_message.hpp
), and sends the response withclient_socket::send()
.
- The
auto thread_worker::handle_client(client_socket client_socket) -> task<> {
http_parser http_parser;
buffer_ring &buffer_ring = buffer_ring::get_instance();
while (true) {
const auto [recv_buffer_id, recv_buffer_size] = co_await client_socket.recv(BUFFER_SIZE);
const std::span<char> recv_buffer = buffer_ring.borrow_buffer(recv_buffer_id, recv_buffer_size);
if (const auto parse_result = http_parser.parse_packet(recv_buffer); parse_result.has_value()) {
const http_request &http_request = parse_result.value();
// Processes the `http_request` and constructs an `http_response`
// with a status that is either `200` or `404`
// (Please refer to the source code)
std::string send_buffer = http_response.serialize();
co_await client_socket.send(send_buffer, send_buffer.size());
}
buffer_ring.return_buffer(recv_buffer_id);
}
}
- The
http_server
creates athread_worker
task for each thread in thethread_pool
and awaits their completion. - Each
thread_worker
creates a socket with theSO_REUSEPORT
option, allowing the reuse of the same port, and spawns thethread_worker::accept_client()
andthread_worker::event_loop()
coroutines. - Upon a client arrival, the
thread_worker::accept_client()
coroutine spawns athread_worker::handle_client()
coroutine to handle HTTP requests for that client. - When either
thread_worker::accept_client()
orthread_worker::handle_client()
awaits an asynchronous I/O operation (such assend()
orrecv()
), it suspends its execution and submits a request to the submission queue ofio_uring
. Execution control is then transferred back tothread_worker::event_loop()
. - The
thread_worker::event_loop()
processes events in the completion queue ofio_uring
. For each event, it identifies the coroutine that is awaiting that event and resumes its execution.
The benchmark is performed with the hey
benchmark tool, which sends 200 batches of requests, with each batch containing 5,000 concurrent clients requesting a file of 1024 bytes in size. co-uring-http
serves 57,012 requests per second and handles 99% of requests within 0.2 seconds.
The benchmark is performed on UTM running on MacBook Air (M1, 2020) with Linux kernel version 6.4.2
. The virtual machine has 4 cores and 8 GB memories. The program is compiled with GCC 13 and optimization level O3
.
./hey -n 1000000 -c 5000 http://127.0.0.1:8080/1k
Summary:
Total: 17.5400 secs
Slowest: 0.3872 secs
Fastest: 0.0001 secs
Average: 0.0824 secs
Requests/sec: 57012.6903
Total data: 1024000000 bytes
Size/request: 1024 bytes
Response time histogram:
0.000 [1] |
0.039 [18601] |■
0.077 [516164] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.116 [344028] |■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.155 [89199] |■■■■■■■
0.194 [22404] |■■
0.232 [3841] |
0.271 [3404] |
0.310 [1907] |
0.348 [221] |
0.387 [230] |
Latency distribution:
10% in 0.0519 secs
25% in 0.0611 secs
50% in 0.0752 secs
75% in 0.0973 secs
90% in 0.1213 secs
95% in 0.1392 secs
99% in 0.1922 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0002 secs, 0.0001 secs, 0.3872 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs
req write: 0.0001 secs, 0.0000 secs, 0.0966 secs
resp wait: 0.0413 secs, 0.0000 secs, 0.2908 secs
resp read: 0.0397 secs, 0.0000 secs, 0.3499 secs
Status code distribution:
[200] 1000000 responses
- Linux Kernel >= 5.19
- GCC >= 13 or Clang >= 14
- Because
libc++
lacks support for certain C++20 features such asjthread
,co-uring-http
should link with GCC'slibstdc++
. - There's a bug in GCC <= 12 that introduces unexpected copies in the
co_await
expression, which will cause a segmentation fault inco-uring-http
.
- Because
- liburing >= 2.3
The .devcontainer/Dockerfile
provides a container image based on ubuntu:lunar
with the required dependencies installed. Please note that the virtual machine of Docker on macOS is based on Linux kernel 5.15, which doesn't meet the requirement of co-uring-http
.
- Generate build configuration with CMake:
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER:FILEPATH=/usr/bin/clang -DCMAKE_CXX_COMPILER:FILEPATH=/usr/bin/clang++ -B build -G "Unix Makefiles"
- Build and run
co_uring_http
, which will listen onlocalhost:8080
and serves static files:
make -C build -j$(nproc)
./build/co_uring_http
The default Linux kernel version for the Windows Subsystem for Linux (WSL) is 5.15, which does not support certain io_uring
features, such as multi-shot accept or ring-mapped buffers. However, WSL provides a .wslconfig
file that enables the use of a custom-built kernel image. It's recommended to build the kernel within an existing WSL instance.
- Install the build dependencies:
sudo apt install git bc build-essential flex bison libssl-dev libelf-dev dwarves
- Download the latest kernel source code:
wget https://github.com/torvalds/linux/archive/refs/tags/v6.4.2.tar.gz -O v6.4.2.tar.gz
tar -xf v6.4.2.tar.gz
- Download the build configuration file for WSL:
wget https://raw.githubusercontent.com/microsoft/WSL2-Linux-Kernel/linux-msft-wsl-6.1.y/Microsoft/config-wsl
cp config-wsl linux-6.4.2/arch/x86/configs/wsl_defconfig
- Build the kernel:
cd linux-6.4.2
make KCONFIG_CONFIG=arch/x86/configs/wsl_defconfig -j$(nproc)
- Clone the kernel image to
$env:USERPROFILE
(default user path) and set.wslconfig
to use the kernel image:
powershell.exe /C 'Copy-Item .\arch\x86\boot\bzImage $env:USERPROFILE'
powershell.exe /C 'Write-Output [wsl2]`nkernel=$env:USERPROFILE\bzImage | % {$_.replace("\","\\")} | Out-File $env:USERPROFILE\.wslconfig -encoding ASCII'
- Restart WSL:
wsl --shutdown