Unhandled mmap for a region without protection flags (`PROT_READ | PROT_WRITE`)
Opened this issue · 5 comments
Hi! It's me again ;) Well, it's been a couple of weeks since I've been struggling with finding out why is my Rust debuginfod consuming an unexpected amount of resident memory (even after malloc_trim
) and I thought your tool might help me. The project is not so complex, but it does a multithreaded inspection of RPM files.
For my demo test-case I get to something like:
$ RAYON_NUM_THREADS=100 mevi ./target/release/debuginfod-rs ~/Downloads/RPM-part
...
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
marxin 10970 21.7 0.1 8392676 105408 pts/4 Sl+ 20:46 0:00 ./target/release/debuginfod-rs /home/marxin/Downloads/RPM-part
So my app consumes ~100MiB, while MEVI reports only < 1MiB which is my expected output:
Full MEVI log:
mevi-log.txt
Looking closer to pmap
, there are really ranges where there are tens of mebibytes mapped:
Address Kbytes RSS PSS Dirty Swap Mode Mapping
00007f8e88000000 17544 14340 14340 14340 0 rw-p- [ anon ]
00007f8e89122000 47992 0 0 0 0 ---p- [ anon ]
00007f8e8c000000 132 4 4 4 0 rw-p- [ anon ]
00007f8e8c021000 65404 0 0 0 0 ---p- [ anon ]
00007f8e90000000 25548 24048 24048 24048 0 rw-p- [ anon ]
00007f8e918f3000 39988 0 0 0 0 ---p- [ anon ]
00007f8e94000000 12944 6164 6164 6164 0 rw-p- [ anon ]
00007f8e94ca4000 52592 0 0 0 0 ---p- [ anon ]
00007f8e98000000 11556 6152 6152 6152 0 rw-p- [ anon ]
00007f8e98b49000 53980 0 0 0 0 ---p- [ anon ]
00007f8e9c000000 10992 4128 4128 4128 0 rw-p- [ anon ]
00007f8e9cabc000 54544 0 0 0 0 ---p- [ anon ]
00007f8ea0000000 16240 16156 16156 16156 0 rw-p- [ anon ]
00007f8ea0fdc000 49296 0 0 0 0 ---p- [ anon ]
00007f8ea4000000 11312 4132 4132 4132 0 rw-p- [ anon ]
00007f8ea4b0c000 54224 0 0 0 0 ---p- [ anon ]
00007f8ea8000000 9860 4112 4112 4112 0 rw-p- [ anon ]
00007f8ea89a1000 55676 0 0 0 0 ---p- [ anon ]
00007f8eac000000 11980 6160 6160 6160 0 rw-p- [ anon ]
00007f8eacbb3000 53556 0 0 0 0 ---p- [ anon ]
00007f8eb0000000 9528 1016 1016 1016 0 rw-p- [ anon ]
00007f8eb094e000 56008 0 0 0 0 ---p- [ anon ]
00007f8eb4000000 9636 1612 1612 1612 0 rw-p- [ anon ]
00007f8eb4969000 55900 0 0 0 0 ---p- [ anon ]
00007f8eb8000000 8980 364 364 364 0 rw-p- [ anon ]
00007f8eb88c5000 56556 0 0 0 0 ---p- [ anon ]
00007f8ebc000000 8988 572 572 572 0 rw-p- [ anon ]
00007f8ebc8c7000 56548 0 0 0 0 ---p- [ anon ]
00007f8ec0000000 176 176 176 176 0 rw-p- [ anon ]
00007f8ec002c000 65360 0 0 0 0 ---p- [ anon ]
00007f8ec41e1000 4 0 0 0 0 ---p- [ anon ]
00007f8ec41e2000 2048 8 8 8 0 rw-p- [ anon ]
Again, I know the tool is just a hobby project, but if you are interested, I can debug the problem. But any hint would be appreciated.
Thanks!
I isolated that deeper a bit (a single rayon thread is sufficient) for the demonstration:
[2024-01-15T20:26:52.574Z INFO debuginfod_rs] walking 1917 RPM files (5.6 GB) <- here the rayon thread is created
2024-01-15T20:26:52.574397Z INFO mevi::tracer: [20459] cloned into [20465] (with SIGTRAP)
2024-01-15T20:26:52.574642Z INFO mevi::tracer: [20465] => [20459] mapping 7f26248e0000..7f26248e3000 (12 KiB) with NotResident
2024-01-15T20:26:52.574729Z INFO mevi::tracer: [20465] => [20459] unmapping 7f261c1ff000..7f2620000000 (62.00 MiB)
2024-01-15T20:26:52.574751Z INFO mevi::tracer: [20465] => [20459] unmapping 7f2624000000..7f26241ff000 (2.00 MiB)
2024-01-15T20:26:52.576913Z INFO mevi::tracer: [20465] => [20459] mapping 7f262479e000..7f26247bf000 (132 KiB) with NotResident
2024-01-15T20:26:52.576955Z INFO mevi::tracer: [20465] => [20459] mapping 7f261f79f000..7f2620000000 (8.38 MiB) with NotResident
2024-01-15T20:26:52.577456Z INFO mevi::tracer: [20465] => [20459] unmapping 7f262479e000..7f26247bf000 (132 KiB)
2024-01-15T20:26:52.577498Z INFO mevi::tracer: [20465] => [20459] unmapping 7f261f79f000..7f2620000000 (8.38 MiB)
That's what I get if I add info message also for munmap
(there's not remmap
or pageout
- assert added).
Let's compare it with strace:
RAYON_NUM_THREADS=1 strace -s0 -f -e mmap,munmap ./target/release/debuginfod-rs ~/Downloads/RPM-part
[2024-01-15T20:29:33.335Z INFO debuginfod_rs] walking 1917 RPM files (5.6 GB)
mmap(NULL, 2101248, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fa62ebff000
strace: Process 20665 attached
[pid 20665] mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fa62f168000
[pid 20665] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa626bff000
[pid 20665] munmap(0x7fa626bff000, 20975616) = 0
[pid 20665] munmap(0x7fa62c000000, 46133248) = 0
[pid 20665] mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa62f026000
[pid 20665] mmap(NULL, 8785920, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa62e39e000
[pid 20665] munmap(0x7fa62f026000, 135168) = 0
[pid 20665] munmap(0x7fa62e39e000, 8785920) = 0
(Note the output comes from 2 runs of the application, but the symptoms are the same).
What happens: after the stack is created for the thread, mmap for 128MiB happens followed by unmmap of ~64MiB. The problem is MEVI can't get the mmap:
[pid 20665] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa626bff000
(while the 2 unmaps of the particular region are properly covered).
Any idea what can happen?
Got it! Oh, it's smart:
[pid 24127] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6d743ff000
[pid 24127] munmap(0x7f6d743ff000, 62918656) = 0
[pid 24127] munmap(0x7f6d7c000000, 4190208) = 0
[pid 24127] mprotect(0x7f6d78000000, 135168, PROT_READ|PROT_WRITE) = 0
So it first calls mmap
for a chunk without any protection level, then it shrinks it and only after that the protection is set to PROT_READ|PROT_WRITE
!
glibc implementation proves my theory: https://github.com/bminor/glibc/blob/master/malloc/arena.c#L404-L441
That makes sense to me! Are you up for writing a PR for this? I don't have a lot of time to look at mevi right now but I'd happily review something.
Sure, I've just created a pull request for it.