Heap corruption when using `NamedPipeClientReader::read_full`

Question

Heap corruption when using `NamedPipeClientReader::read_full`

Closed this issue a year ago · 3 comments

I ran into a crash when I was doing some tests with Windows named pipes in message mode with this library, and it seems like it's related to some kind of memory unsafety in NamedPipeClientReader::read_full. I put up a repo with a reproducible test case here: https://github.com/kylewlacy/windows-test-named-pipe

When run with the default settings (cargo run in that repo), I reliably get this error:

error: process didn't exit successfully: `target\debug\test-named-pipes.exe` (exit code: 0xc0000374, STATUS_HEAP_CORRUPTION)

I've also on occasion seen this error message instead, but I haven't figured out how to reproduce it reliably:

Error: Invalid access to memory location. (0x800703E6)

Answer 1 · 2023-12-03T20:03:13.000Z

I'm wondering how much of the

Error: Invalid access to memory location. (0x800703E6)

Is related to the original issue that just got fixed. 🤔 Not sure if we can mark this issue as solved yet or not

Do you happen to know which function triggered this?

If you got this message often enough (even if unreliably), but it stops now (or not), that would probably serve as proof whether it's still here or not.

Answer 2 · 2023-12-03T21:56:33.000Z

I was getting both error messages with my windows-test-named-pipe test program, but I haven't either one again with the patch from #2 in place.

Most of the time when running, I was getting the "heap corruption" message, but if I played with the message sizes (the -m flag), then I would occasionally see the "invalid access" message instead. I didn't attach a debugger or anything, but I believe both errors were from the read_full method.

Windows is kind of a foreign land for me (I use it as my daily driver, but I basically live in WSL). My best guess is that the STATUS_HEAP_CORRUPTION error is triggered by the memory allocator, when it uses some heuristic to determine that a process has written outside of an allocation. On the other hand, the the "invalid access to memory location" message seems to be when the kernel sees a process try to read/write to a page that wasn't mapped to that process. If that guess is right, and assuming the code before #2 was merged had a bug causing it to write past the end of the allocated Vec, that could explain why the error is non-deterministic:

If the Vec gets allocated in a spot that's close to unmapped memory, then the write past the end of the buffer ends up in an unmapped region of memory, and the kernel kills the process with prejudice before the memory allocator catches it (with the "Invalid access to memory location" error).
Otherwise, the write past the end happens to stay within mapped memory, but the memory allocator sees that the memory outside of the program's allocations has been messed with, and so it kills the process with the STATUS_HEAP_CORRUPTION error before the kernel gets involved.

If that's accurate, then I'd also guess it would be possible to reproduce both errors using my test program, but by adding a useless Vec allocation and varying its size (which would change where the corrupted Vec gets allocated). But, since both errors seemed to go away after merging #2 (even when using large message sizes), I feel pretty confident that both errors I saw have been fully resolved. I feel it makes sense to mark this issue as "completed" until one or both error messages get spotted again.

Answer 3 · 2023-12-03T22:02:14.000Z

I think so too. If it was reproducible enough (even if unreliable) and is now gone, it's most likely read_full() was the original offender.