io_uring: io_uring_setup syscall returns ENOMEM

Question

io_uring: io_uring_setup syscall returns ENOMEM

Closed this issue 5 years ago · 5 comments

io_uring_setup syscall returns ENOMEM when code tries to allocate too much io_urings. Specifically, this test cargo test --features=testing,io_uring log_chunky_iterator allocates 100 threads and they start returning ENOMEM on io_uring_setup starting from ~20. This is very beginning of the io_uring setup and even fails before queues mmaps. I will trace kernel functions to understand better why it has this limit, but the fix anyway is to reduce the test parallelism for io_uring setup.

It's clearly observed by

$ strace -f env cargo test --features=testing,io_uring log_chunky_iterator

Tracing notes: http://blog.vmsplice.net/2019/08/determining-why-linux-syscall-failed.html

$ sudo trace-cmd record -p function_graph -g __x64_sys_io_uring_setup
$ sudo trace-cmd report --cpu 0

It is true, that execution return ENOMEM at:

log_chunky_iter-27409 [000] 40690.231487: funcgraph_exit:       ! 238.103 us |  } <-- good invocation
 log_chunky_iter-27466 [000] 40690.232135: funcgraph_exit:       ! 459.701 us |  } <-- good invocation
 log_chunky_iter-27387 [000] 40690.241284: funcgraph_exit:         2.434 us   |  } <-- bad invocation
 log_chunky_iter-27394 [000] 40690.243183: funcgraph_exit:         2.269 us   |  }
...
 log_chunky_iter-27437 [000] 40690.250563: funcgraph_entry:                   |  __x64_sys_io_uring_setup() {
 log_chunky_iter-27437 [000] 40690.250563: funcgraph_entry:                   |    io_uring_setup() {
 log_chunky_iter-27437 [000] 40690.250564: funcgraph_entry:                   |      capable() {
 log_chunky_iter-27437 [000] 40690.250564: funcgraph_entry:                   |        ns_capable_common() {
 log_chunky_iter-27437 [000] 40690.250564: funcgraph_entry:                   |          security_capable() {
 log_chunky_iter-27437 [000] 40690.250564: funcgraph_entry:                   |            cap_capable() {
 log_chunky_iter-27437 [000] 40690.250564: funcgraph_exit:         0.152 us   |            }
 log_chunky_iter-27437 [000] 40690.250564: funcgraph_exit:         0.489 us   |          }
 log_chunky_iter-27437 [000] 40690.250565: funcgraph_exit:         0.747 us   |        }
 log_chunky_iter-27437 [000] 40690.250565: funcgraph_exit:         0.984 us   |      }
 log_chunky_iter-27437 [000] 40690.250565: funcgraph_entry:                   |      free_uid() {
 log_chunky_iter-27437 [000] 40690.250565: funcgraph_exit:         0.143 us   |      }
 log_chunky_iter-27437 [000] 40690.250565: funcgraph_exit:         1.618 us   |    }
 log_chunky_iter-27437 [000] 40690.250565: funcgraph_exit:         2.215 us   |  }

from 5.3.0 kernel:

	account_mem = !capable(CAP_IPC_LOCK);
	if (account_mem) {
		ret = io_account_mem(user,
				ring_pages(p->sq_entries, p->cq_entries));
		if (ret) {
			free_uid(user);
			return ret;
		}
	}

	ctx = io_ring_ctx_alloc(p);
	if (!ctx) {
		if (account_mem)
			io_unaccount_mem(user, ring_pages(p->sq_entries,
								p->cq_entries));
		free_uid(user);
		return -ENOMEM;
	}

So basically it fails due to the:

	/* Don't allow more pages than we can safely lock */
	page_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;

Limit of allocated pages per process

Answer 1 · 2020-01-13T17:19:34.000Z

This can be worked around by raising the per-user rlimit memlocked limit. It's generally pretty low on systems. See /etc/security/limits.{d,conf}

Answer 2 · 2020-01-13T23:16:22.000Z

@axboe thanks!

Answer 3 · 2020-01-13T23:18:25.000Z

@sitano thanks for diving into this! that definitely helps clarify things for me around why this was happening

Answer 4 · 2020-01-14T05:17:35.000Z

I thought to reduce test parallelism as another option. But maybe it could be done from the test (at least check)

On Tue, 14 Jan 2020 at 01:18, Tyler Neely ***@***.***> wrote: @sitano <https://github.com/sitano> thanks for diving into this! that definitely helps clarify things for me around why this was happening — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#899?email_source=notifications&email_token=AAEJ3AWM5RMQINKHIAC7JOTQ5TZEFA5CNFSM4JTRHBI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI2VG7Q#issuecomment-573920126>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEJ3AVO3UJB7ZB542C7NZDQ5TZEFANCNFSM4JTRHBIQ> .

-- -- Ivan

Answer 5 · 2020-01-22T17:00:31.000Z

yeah, the only option is increasing allowed memory locked pages. so yeah.
4096kb is enough for all tests to pass.