io_uring: io_uring_setup syscall returns ENOMEM
Closed this issue · 5 comments
io_uring_setup
syscall returns ENOMEM when code tries to allocate too much io_urings. Specifically, this test cargo test --features=testing,io_uring log_chunky_iterator
allocates 100 threads and they start returning ENOMEM on io_uring_setup
starting from ~20. This is very beginning of the io_uring setup and even fails before queues mmaps. I will trace kernel functions to understand better why it has this limit, but the fix anyway is to reduce the test parallelism for io_uring setup.
It's clearly observed by
$ strace -f env cargo test --features=testing,io_uring log_chunky_iterator
Tracing notes: http://blog.vmsplice.net/2019/08/determining-why-linux-syscall-failed.html
$ sudo trace-cmd record -p function_graph -g __x64_sys_io_uring_setup
$ sudo trace-cmd report --cpu 0
It is true, that execution return ENOMEM at:
log_chunky_iter-27409 [000] 40690.231487: funcgraph_exit: ! 238.103 us | } <-- good invocation
log_chunky_iter-27466 [000] 40690.232135: funcgraph_exit: ! 459.701 us | } <-- good invocation
log_chunky_iter-27387 [000] 40690.241284: funcgraph_exit: 2.434 us | } <-- bad invocation
log_chunky_iter-27394 [000] 40690.243183: funcgraph_exit: 2.269 us | }
...
log_chunky_iter-27437 [000] 40690.250563: funcgraph_entry: | __x64_sys_io_uring_setup() {
log_chunky_iter-27437 [000] 40690.250563: funcgraph_entry: | io_uring_setup() {
log_chunky_iter-27437 [000] 40690.250564: funcgraph_entry: | capable() {
log_chunky_iter-27437 [000] 40690.250564: funcgraph_entry: | ns_capable_common() {
log_chunky_iter-27437 [000] 40690.250564: funcgraph_entry: | security_capable() {
log_chunky_iter-27437 [000] 40690.250564: funcgraph_entry: | cap_capable() {
log_chunky_iter-27437 [000] 40690.250564: funcgraph_exit: 0.152 us | }
log_chunky_iter-27437 [000] 40690.250564: funcgraph_exit: 0.489 us | }
log_chunky_iter-27437 [000] 40690.250565: funcgraph_exit: 0.747 us | }
log_chunky_iter-27437 [000] 40690.250565: funcgraph_exit: 0.984 us | }
log_chunky_iter-27437 [000] 40690.250565: funcgraph_entry: | free_uid() {
log_chunky_iter-27437 [000] 40690.250565: funcgraph_exit: 0.143 us | }
log_chunky_iter-27437 [000] 40690.250565: funcgraph_exit: 1.618 us | }
log_chunky_iter-27437 [000] 40690.250565: funcgraph_exit: 2.215 us | }
from 5.3.0 kernel:
account_mem = !capable(CAP_IPC_LOCK);
if (account_mem) {
ret = io_account_mem(user,
ring_pages(p->sq_entries, p->cq_entries));
if (ret) {
free_uid(user);
return ret;
}
}
ctx = io_ring_ctx_alloc(p);
if (!ctx) {
if (account_mem)
io_unaccount_mem(user, ring_pages(p->sq_entries,
p->cq_entries));
free_uid(user);
return -ENOMEM;
}
So basically it fails due to the:
/* Don't allow more pages than we can safely lock */
page_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
Limit of allocated pages per process
This can be worked around by raising the per-user rlimit memlocked limit. It's generally pretty low on systems. See /etc/security/limits.{d,conf}
@sitano thanks for diving into this! that definitely helps clarify things for me around why this was happening
yeah, the only option is increasing allowed memory locked pages. so yeah.
4096kb is enough for all tests to pass.