fasterthanlime/mevi

Unhandled mmap for a region without protection flags (`PROT_READ | PROT_WRITE`)

Opened this issue · 5 comments

Hi! It's me again ;) Well, it's been a couple of weeks since I've been struggling with finding out why is my Rust debuginfod consuming an unexpected amount of resident memory (even after malloc_trim) and I thought your tool might help me. The project is not so complex, but it does a multithreaded inspection of RPM files.

For my demo test-case I get to something like:

$ RAYON_NUM_THREADS=100 mevi ./target/release/debuginfod-rs ~/Downloads/RPM-part
...
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
marxin   10970 21.7  0.1 8392676 105408 pts/4  Sl+  20:46   0:00 ./target/release/debuginfod-rs /home/marxin/Downloads/RPM-part

So my app consumes ~100MiB, while MEVI reports only < 1MiB which is my expected output:
Screenshot from 2024-01-15 20-47-56

Full MEVI log:
mevi-log.txt

Looking closer to pmap, there are really ranges where there are tens of mebibytes mapped:

Address           Kbytes     RSS     PSS   Dirty    Swap Mode  Mapping
00007f8e88000000   17544   14340   14340   14340       0 rw-p-   [ anon ]
00007f8e89122000   47992       0       0       0       0 ---p-   [ anon ]
00007f8e8c000000     132       4       4       4       0 rw-p-   [ anon ]
00007f8e8c021000   65404       0       0       0       0 ---p-   [ anon ]
00007f8e90000000   25548   24048   24048   24048       0 rw-p-   [ anon ]
00007f8e918f3000   39988       0       0       0       0 ---p-   [ anon ]
00007f8e94000000   12944    6164    6164    6164       0 rw-p-   [ anon ]
00007f8e94ca4000   52592       0       0       0       0 ---p-   [ anon ]
00007f8e98000000   11556    6152    6152    6152       0 rw-p-   [ anon ]
00007f8e98b49000   53980       0       0       0       0 ---p-   [ anon ]
00007f8e9c000000   10992    4128    4128    4128       0 rw-p-   [ anon ]
00007f8e9cabc000   54544       0       0       0       0 ---p-   [ anon ]
00007f8ea0000000   16240   16156   16156   16156       0 rw-p-   [ anon ]
00007f8ea0fdc000   49296       0       0       0       0 ---p-   [ anon ]
00007f8ea4000000   11312    4132    4132    4132       0 rw-p-   [ anon ]
00007f8ea4b0c000   54224       0       0       0       0 ---p-   [ anon ]
00007f8ea8000000    9860    4112    4112    4112       0 rw-p-   [ anon ]
00007f8ea89a1000   55676       0       0       0       0 ---p-   [ anon ]
00007f8eac000000   11980    6160    6160    6160       0 rw-p-   [ anon ]
00007f8eacbb3000   53556       0       0       0       0 ---p-   [ anon ]
00007f8eb0000000    9528    1016    1016    1016       0 rw-p-   [ anon ]
00007f8eb094e000   56008       0       0       0       0 ---p-   [ anon ]
00007f8eb4000000    9636    1612    1612    1612       0 rw-p-   [ anon ]
00007f8eb4969000   55900       0       0       0       0 ---p-   [ anon ]
00007f8eb8000000    8980     364     364     364       0 rw-p-   [ anon ]
00007f8eb88c5000   56556       0       0       0       0 ---p-   [ anon ]
00007f8ebc000000    8988     572     572     572       0 rw-p-   [ anon ]
00007f8ebc8c7000   56548       0       0       0       0 ---p-   [ anon ]
00007f8ec0000000     176     176     176     176       0 rw-p-   [ anon ]
00007f8ec002c000   65360       0       0       0       0 ---p-   [ anon ]
00007f8ec41e1000       4       0       0       0       0 ---p-   [ anon ]
00007f8ec41e2000    2048       8       8       8       0 rw-p-   [ anon ]

Again, I know the tool is just a hobby project, but if you are interested, I can debug the problem. But any hint would be appreciated.

Thanks!

I isolated that deeper a bit (a single rayon thread is sufficient) for the demonstration:

[2024-01-15T20:26:52.574Z INFO  debuginfod_rs] walking 1917 RPM files (5.6 GB) <- here the rayon thread is created
2024-01-15T20:26:52.574397Z  INFO mevi::tracer: [20459] cloned into [20465] (with SIGTRAP)
2024-01-15T20:26:52.574642Z  INFO mevi::tracer: [20465] => [20459] mapping 7f26248e0000..7f26248e3000 (12 KiB) with NotResident
2024-01-15T20:26:52.574729Z  INFO mevi::tracer: [20465] => [20459] unmapping 7f261c1ff000..7f2620000000 (62.00 MiB)
2024-01-15T20:26:52.574751Z  INFO mevi::tracer: [20465] => [20459] unmapping 7f2624000000..7f26241ff000 (2.00 MiB)
2024-01-15T20:26:52.576913Z  INFO mevi::tracer: [20465] => [20459] mapping 7f262479e000..7f26247bf000 (132 KiB) with NotResident
2024-01-15T20:26:52.576955Z  INFO mevi::tracer: [20465] => [20459] mapping 7f261f79f000..7f2620000000 (8.38 MiB) with NotResident
2024-01-15T20:26:52.577456Z  INFO mevi::tracer: [20465] => [20459] unmapping 7f262479e000..7f26247bf000 (132 KiB)
2024-01-15T20:26:52.577498Z  INFO mevi::tracer: [20465] => [20459] unmapping 7f261f79f000..7f2620000000 (8.38 MiB)

That's what I get if I add info message also for munmap (there's not remmap or pageout - assert added).

Let's compare it with strace:

RAYON_NUM_THREADS=1 strace -s0 -f -e mmap,munmap ./target/release/debuginfod-rs ~/Downloads/RPM-part
[2024-01-15T20:29:33.335Z INFO  debuginfod_rs] walking 1917 RPM files (5.6 GB)
mmap(NULL, 2101248, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fa62ebff000
strace: Process 20665 attached
[pid 20665] mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fa62f168000
[pid 20665] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa626bff000
[pid 20665] munmap(0x7fa626bff000, 20975616) = 0
[pid 20665] munmap(0x7fa62c000000, 46133248) = 0
[pid 20665] mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa62f026000
[pid 20665] mmap(NULL, 8785920, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa62e39e000
[pid 20665] munmap(0x7fa62f026000, 135168) = 0
[pid 20665] munmap(0x7fa62e39e000, 8785920) = 0

(Note the output comes from 2 runs of the application, but the symptoms are the same).

What happens: after the stack is created for the thread, mmap for 128MiB happens followed by unmmap of ~64MiB. The problem is MEVI can't get the mmap:
[pid 20665] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa626bff000
(while the 2 unmaps of the particular region are properly covered).

Any idea what can happen?

Got it! Oh, it's smart:

[pid 24127] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6d743ff000
[pid 24127] munmap(0x7f6d743ff000, 62918656) = 0
[pid 24127] munmap(0x7f6d7c000000, 4190208) = 0
[pid 24127] mprotect(0x7f6d78000000, 135168, PROT_READ|PROT_WRITE) = 0

So it first calls mmapfor a chunk without any protection level, then it shrinks it and only after that the protection is set to PROT_READ|PROT_WRITE!

That makes sense to me! Are you up for writing a PR for this? I don't have a lot of time to look at mevi right now but I'd happily review something.

Sure, I've just created a pull request for it.