Creating a slice of uninit memory is unsound (equivalent to mem::uninitialized)
Shnatsel opened this issue · 21 comments
flate2 creates slices of uninitialized memory in several places. The only two places where it happens when using the Rust backend are here and here.
This is equivalent to the use of the now-deprecated mem::uninitialized
. Instead, either a slice of MaybeUninit<T>
should be constructed and passed to the backends, or the backends should receive a structure that does not expose uninitialized memory by design such as Vec or a Vec-like fixed-capacity view of memory.
If the backend does not overwrite the entire slice, this can become an exploitable security vulnerability.
What is the status of this? Are these concerns purely hypothetical, or is there potentially unsoundness in the library?
This is potential unsoundness; the current Rust memory model (and the current LLVM memory model) treats uninits specially.
You can find more details about the way unitialized memory is treated by C and Rust here: https://www.ralfj.de/blog/2019/07/14/uninit.html
flate2
itself does not read from the backing slice, but the compiler is allowed to insert spurious reads for optimization purposes. Some examples (although for different kinds of UB) can be found here.
The nightly-only BorrowedCursor would make fixing this easier, but alas, it's nightly-only. The reasonable fix I can see is calling .assume_init()
only after the slice is filled and only on the filled parts for C libraries, and using miniz_oxide
's output to Vec
to avoid going through unsafe
in the pure-rust codepath.
For the rust back-end we shouldn't really need to be using unsafe stuff unless there is some very good reason for it. Not sure if say avoiding zero-initializing a vector is worth it. miniz_oxide already avoids using unsafe without any drastic performance implications, so not sure why flate2 would need to either.
FWIW I've attempted to address this by switching to naive safe code (using vec.resize(capacity, 0)
to fill the spare capacity with zeros). See the PR here: #373
Going via &mut [MaybeUninit<u8>]
may also be possible, but it would require some extra considerations:
Vec::spare_capacity_mut
has been introduced in Rust 1.60.0. Thankfully this seems compatible with this crate's MSRV policyMaybeUninit<u8>
support would have to be plumbed into the following places which currently write their output to&mut [u8]
:Compress::compress
which passesoutput
toDeflateBackend::compress
which has the followingimpl
s:- C-based
DeflateBackend::compress
- Rusty
DeflateBackend::compress
which calls intodeflate
inminiz_oxide
- C-based
Decompress::decompress
- C-based
InflateBackend::decompress
- Rusty
InflateBackend::decompress
which calls intoinflate
inminiz_oxide
- C-based
If you are creating a vector from scratch, using vec![0; capacity]
is faster than vec.resize(capacity, 0)
because it requests already-zeroed memory from the OS.
If you are creating a vector from scratch, using
vec![0; capacity]
is faster thanvec.resize(capacity, 0)
because it requests already-zeroed memory from the OS.
Ack.
Here (e.g. in compress_vec
) we are not creating a Vec
from scratch, but instead we are working with the extra space in the spare capacity of an existing Vec
. (OTOH, I guess that the caller of compress_vec
may very well call Vec::with_capacity
to crate a Vec
from scratch.)
Still, maybe using vec.resize(capacity, 0)
is unnecessarily pessimizing the performance here. One alternative would be to use Vec::spare_capacity_mut
and then (somehow... waving hands...) use unsafe
code to fill the spare capacity with zeros and treat it as &mut [u8]
.
- I think that
slice::fill
doesn't help, because it requires an already-initialized&mut [u8]
as input - FWIW https://users.rust-lang.org/t/zeroing-a-slice-of-integers/33825 mentions
safemem::write_bytes
- takes&mut [u8]
as input and therefore doesn't helpzeroize
- also doesn't work withMaybeUninit
ptr::write_bytes
- maybe that's what we have to use (ifVec::resize
is a no-go):
So, maybe I should change the PR to use a little bit of unsafe
as follows?:
fn write_to_spare_capacity_of_vec<T>(
output: &mut Vec<u8>,
writer: impl FnOnce(&mut [u8]) -> (usize, T),
) -> T {
let zeroed_spare_space: &mut [u8] = {
let uninited_spare_space = output.spare_capacity_mut();
let ptr = uninited_spare_space.as_mut_ptr() as *mut u8;
let len = uninited_spare_space.len();
unsafe {
// Safety of `write_bytes` and `from_raw_parts_mut`:
// * `ptr` is non-null and points to `len` bytes within a single allocated object
// (guaranteed by `spare_capacity_mut` returning `&mut [MaybeUninit<u8>]`).
// * Alignment of `u8` is 1 (so, not a concern)
core::ptr::write_bytes(ptr, 0, len);
// Safety of `from_raw_parts_mut`:
// * `ptr points to `len` consecutive properly initialized values of type T
// (guaranteed by our call to `ptr::write_bytes`).
// * The memory referenced by the returned slice must not be accessed through any other
// pointer (not derived from the return value) for the duration of lifetime 'a
// (guaranteed by exclusivity of `&mut [T]` returned by `spare_capacity_mut`).
core::slice::from_raw_parts_mut(ptr, len)
}
};
let (bytes_written, ret) = writer(zeroed_spare_space);
// Safety: Usage of `core::cmp::min` sanitizes `bytes_written` (making `set_len` safe even if
// `writer` misbehaves and returns an arbitrary `bytes_written`). Note that above we have
// already made sure that all spare-capacity-bytes are initialized via `ptr::write_bytes`).
let new_len = core::cmp::min(output.len() + bytes_written, output.capacity());
unsafe { output.set_len(new_len); }
ret
}
WDYT?
I don't see why any of those solutions would be better than vec.resize(capacity, 0)
.
We could avoid re-initializing the same part over and over by keeping track what part of the Vec has already been initialized in previous iterations, but that relies on no other code running mem::replace
on our Vec and also the Vec never reallocating, which is very fragile.
BorrowedBuf
+ BorrowedCursor
could help here, but that's nightly-only.
The approach of writing to the spare capacity without zeroing it first sounds good, provided that the C library wrappers are adjusted to write to a &mut [MaybeUninit<u8>]
.
The approach of writing to the spare capacity without zeroing it first sounds good, provided that the C library wrappers are adjusted to write to a
&mut [MaybeUninit<u8>]
.
Yes, that would work for the C backend.
What about the Rust backend though? The Rust backend would also need to be adjusted to write to a &mut [MaybeUninit<u8>]
, right? If so, then a prerequisite for this approach would be to also change the miniz_oxide
crate as outlined in #220 (comment). Do you think that we should pursue such changes to the miniz_oxide
crate?
miniz_oxide
always zeroes the output buffer for safety even when writing to a Vec: https://docs.rs/miniz_oxide/0.7.1/src/miniz_oxide/deflate/mod.rs.html#121-157
I think it's best to just zero the buffer when using miniz_oxide
instead of revamping its internals to use MaybeUninit
.
Using MaybeUninit
requires changing the public API of https://docs.rs/flate2/latest/flate2/struct.Compress.html, right? I see two options:
- Option1: Add a new
pub fn compress_uninit(..., output: &mut [MaybeUninit<u8>], ...)
- Option2: Change the type of output in the existing
pub fn compress
I guess option1 sounds better / less disruptive? Can you please provide feedback on this?
And then let me also mention that we would use MaybeUninit
in the C/Rust-agnostic InflateBackend::decompress
(and in DeflateBackend::compress
). And C would rely on C-side initialization, while Rust would zero-initialize before calling into miniz_oxide
. Please shout if this doesn't sound right.
I note that this means that Rust may repeatedly initialize the same bytes over and over again as you point out in #220 (comment). Is this okay?
compress_vec and decompress_vec functions are really just convenience wrappers for using compress
/decompress
which operates on a slice with Vecs wrapping some unsafe code to use the allocated unused capacity of the vec so not sure if there is strictly a need for them to be there in their current form either. It may be better to leave the output buffer handling to the caller or suggest simply using compress
/decompress
which writes directly to a slice if they need the maximum flexibility if zero-initialization in compress_vec is too much of an impact.
Whether a new pub fn compress_uninit(..., output: &mut [MaybeUninit<u8>], ...)
would have a benefit beyond the current functions is a separate question.
Let me try to summarize the options we have identified so far (focusing on compress_vec
but the notes should also carry over to decompress_vec
):
- Fix1: Avoid
unsafe
.- One particular approach using `vec.resize(capacity, 0) has been implemented in #373
- @Shnatsel points out in #220 (comment) that this may end up re-initializing the same part of memory over and over again. This would effectively change the complexity of filling a vector with N spare capacity from O(N) to O(N x M) where M is the number of
compress_vec
calls needed to fill N bytes. OTOH, the docs ofcompress
say that it is "writing as much output as possible" which suggests that M should be 1 in practice.
- Fix2: Switch implementation of
compress_vec
toMaybeUninit<u8>
/Vec::spare_capacity_mut
- This requires some cascading changes to use
MaybeUninit<u8>
in functions called bycompress_vec
(and maybe in functions called by those functions).- This is not a big problem for the C backend.
- For the Rust backend this may require changing the
miniz_oxide
crate to provide a public API for writing to&mut [MaybeUninit<u8>]
. Alternatively,flate2
could initialize the buffer before calling intominiz_oxide
(but IIUC this would have similar characteristics and concerns to just doingvec.resize(capacity, 0)
- the Fix1 approach) - This would also require tweaking the
flate2::Compress
struct. At the very least, we would need an internal API that can write to&mut [MaybeUninit<u8>]
(e.g. thecompress_uninit
suggested as Option1 in #220 (comment)).
- This requires some cascading changes to use
- Fix3: Remove the
compress_vec
API altogether- AFAIU this is suggested by @oyvindln in #220 (comment)
The performance impact of Fix1 and breaking impact of Fix3 depend on how flate2
is used in practice. The impact may not be that great based on the top 10 crates from https://crates.io/crates/flate2/reverse_dependencies which are listed below:
syn
- no calls tocompress_vec
nordecompress_vec
- see the search query hereobject
- callsdecompress_vec
on a vector created withVec::with_capacity
(see here)rustix
- no hits - see heretonic
- no hitszip
- no hitspng
- no hitstower-http
- no hitsasync-compression
- no hitsactiv-http
- no hitscxxbridge-macro
- no hits
If calls to compress_vec
and decompress_vec
are uncommon, then I'd rather make them slightly slower through zero-initialization than remove these functions outright.
If calls to
compress_vec
anddecompress_vec
are uncommon, then I'd rather make them slightly slower through zero-initialization than remove these functions outright.
Ack. Both Fix1 and Fix3 are easy ways to avoid UB. I can put up a PR for either one. OTOH, I think that I have a slight preference toward Fix3 (removal), because I don't see the benefit of keeping these functions. It seems that making these functions safe (Fix1) will make them somewhat performance-hostile (i.e. they can lead to repeatedly reinitializing a slice of memory, which means that using compress_vec
can regress, rather than improve performance compared to compress
). Maybe I missed some other benefits of keeping these functions around?
/cc @Byron
Can you please provide your feedback on this issue? Which of the fix options discussed above would you prefer? (e.g. out of the ones in #220 (comment))
It may be worth continuing the community discussion on this issue, but I also wonder who the decision maker here is (i.e. who can review, approve, and merge PRs)? I see that you've merged most of the recent PRs, so I hope you won't mind being CC-ed / summoned into this issue :-).
Thanks for chiming me in - I was too busy in the past days but am glad I join now since all information has been gathered so neatly. It took an hour to catch up on all that was said, with a considerable portion spent on reading Ralf Jung's excellent articles (again, probably, I keep forgetting).
With that said, here is my amateur perspective on uninitialized memory:
- it's like
Option<u8>
and hence observing it is forbidden until it becomesSome(v)
. If it's observed anyway, it's UB. - Creating a slice from uninitialized memory is UB because it allows reading. Here it's less about whether it's actually read, but that it can be read, and strange optimizations can happen if this is done. The compiler can also insert spurious reads for optimization, which may cause said memory to be read even though it looks like it will only be written to.
So for me it's also established that the current behaviour of compress_vec
needs fixing.
I think it worth spelling out that correctness always trumps performance, so performance considerations shouldn't hold back a simple fix - optimizations, possibly like discussed, can be made later if there is demand.
As a general stance, we will avoid breaking changes as these will ripple through the ecosystem with a lot of churn, so any fix here must be none-breaking.
From the above it seems clear that #373 is a suitable and welcome fix.
Regarding performance
gitoxide
is an avid user of flate2
. The most performance sensitive code is stream decompression which happens to use decompress()
with its own buffer management. I spent some time profiling the code to arrive there, and it appears that those who truly are dependent on the fastest possible implementation wouldn't use (de)compress_vec()
anyway. gitoxide
settled for reusing allocated buffers.
It's notable that gitoxide
still doesn't re-use inflate state which it truly should for even more performance, as this code is called a whole lot, but I digress.
Conclusion
Yes, let's merge the fix and not worry about performance as those with performance concerns will have suitable alternatives. Correctness is more important.
Maybe there is more spots of code that have soundness issues like this? I'd love to see them rooted out. Thanks everyone for your help and engagement, it's much appreciated. I am grateful too as I feel like I became a little more aware and could adjust my mental model towards uninitialized memory. I think gitoxide
also has those issues and I will revisit them (i.e. v = Vec::with_capacity(x); v.set_len(x); &mut v
).
Any plans for releasing a new version of the crate? (if so, then it would be greatly appraciated as it would help me get approval to import the crate into my project, without having to carry a patch on top of the last released version)
it's like
Option<u8>
and hence observing it is forbidden until it becomesSome(v)
. If it's observed anyway, it's UB.
This is incorrect, and a common misapprehension. Having uninitialized memory is instantly UB even if you never read from it. I.e. MaybeUninit::new().assume_init()
is always UB even if you ignore the value.
The exception to this is MaybeUninit
itself and certain aggregates of it (like arrays of MaybeUninit
), which are allowed to inhabit the uninit state.
Similarly,
Creating a slice from uninitialized memory is UB because it allows reading
it's UB automatically, regardless of what it allows
it's UB automatically, regardless of what it allows
Uninitialised memory is even meaner than I thought 😅.
I will prepare a new release now.