rust-lang/flate2-rs

Creating a slice of uninit memory is unsound (equivalent to mem::uninitialized)

Shnatsel opened this issue · 21 comments

flate2 creates slices of uninitialized memory in several places. The only two places where it happens when using the Rust backend are here and here.

This is equivalent to the use of the now-deprecated mem::uninitialized. Instead, either a slice of MaybeUninit<T> should be constructed and passed to the backends, or the backends should receive a structure that does not expose uninitialized memory by design such as Vec or a Vec-like fixed-capacity view of memory.

If the backend does not overwrite the entire slice, this can become an exploitable security vulnerability.

What is the status of this? Are these concerns purely hypothetical, or is there potentially unsoundness in the library?

This is potential unsoundness; the current Rust memory model (and the current LLVM memory model) treats uninits specially.

You can find more details about the way unitialized memory is treated by C and Rust here: https://www.ralfj.de/blog/2019/07/14/uninit.html

flate2 itself does not read from the backing slice, but the compiler is allowed to insert spurious reads for optimization purposes. Some examples (although for different kinds of UB) can be found here.

The nightly-only BorrowedCursor would make fixing this easier, but alas, it's nightly-only. The reasonable fix I can see is calling .assume_init() only after the slice is filled and only on the filled parts for C libraries, and using miniz_oxide's output to Vec to avoid going through unsafe in the pure-rust codepath.

For the rust back-end we shouldn't really need to be using unsafe stuff unless there is some very good reason for it. Not sure if say avoiding zero-initializing a vector is worth it. miniz_oxide already avoids using unsafe without any drastic performance implications, so not sure why flate2 would need to either.

FWIW I've attempted to address this by switching to naive safe code (using vec.resize(capacity, 0) to fill the spare capacity with zeros). See the PR here: #373

Going via &mut [MaybeUninit<u8>] may also be possible, but it would require some extra considerations:

If you are creating a vector from scratch, using vec![0; capacity] is faster than vec.resize(capacity, 0) because it requests already-zeroed memory from the OS.

If you are creating a vector from scratch, using vec![0; capacity] is faster than vec.resize(capacity, 0) because it requests already-zeroed memory from the OS.

Ack.

Here (e.g. in compress_vec) we are not creating a Vec from scratch, but instead we are working with the extra space in the spare capacity of an existing Vec. (OTOH, I guess that the caller of compress_vec may very well call Vec::with_capacity to crate a Vec from scratch.)

Still, maybe using vec.resize(capacity, 0) is unnecessarily pessimizing the performance here. One alternative would be to use Vec::spare_capacity_mut and then (somehow... waving hands...) use unsafe code to fill the spare capacity with zeros and treat it as &mut [u8].

So, maybe I should change the PR to use a little bit of unsafe as follows?:

fn write_to_spare_capacity_of_vec<T>(
    output: &mut Vec<u8>,
    writer: impl FnOnce(&mut [u8]) -> (usize, T),
) -> T {
    let zeroed_spare_space: &mut [u8] = {
        let uninited_spare_space = output.spare_capacity_mut();
        let ptr = uninited_spare_space.as_mut_ptr() as *mut u8;
        let len = uninited_spare_space.len();
        unsafe {
            // Safety of `write_bytes` and `from_raw_parts_mut`:
            // * `ptr` is non-null and points to `len` bytes within a single allocated object
            //   (guaranteed by `spare_capacity_mut` returning `&mut [MaybeUninit<u8>]`).
            // * Alignment of `u8` is 1 (so, not a concern)
            core::ptr::write_bytes(ptr, 0, len);

            // Safety of `from_raw_parts_mut`:
            // * `ptr points to `len` consecutive properly initialized values of type T
            //   (guaranteed by our call to `ptr::write_bytes`).
            // * The memory referenced by the returned slice must not be accessed through any other
            //   pointer (not derived from the return value) for the duration of lifetime 'a
            //   (guaranteed by exclusivity of `&mut [T]` returned by `spare_capacity_mut`).       
            core::slice::from_raw_parts_mut(ptr, len)
        }
    };

    let (bytes_written, ret) = writer(zeroed_spare_space);

    // Safety: Usage of `core::cmp::min` sanitizes `bytes_written` (making `set_len` safe even if
    // `writer` misbehaves and returns an arbitrary `bytes_written`).  Note that above we have
    // already made sure that all spare-capacity-bytes are initialized via `ptr::write_bytes`).
    let new_len = core::cmp::min(output.len() + bytes_written, output.capacity());
    unsafe { output.set_len(new_len); }

    ret
}

WDYT?

I don't see why any of those solutions would be better than vec.resize(capacity, 0).

We could avoid re-initializing the same part over and over by keeping track what part of the Vec has already been initialized in previous iterations, but that relies on no other code running mem::replace on our Vec and also the Vec never reallocating, which is very fragile.

BorrowedBuf + BorrowedCursor could help here, but that's nightly-only.

The approach of writing to the spare capacity without zeroing it first sounds good, provided that the C library wrappers are adjusted to write to a &mut [MaybeUninit<u8>].

The approach of writing to the spare capacity without zeroing it first sounds good, provided that the C library wrappers are adjusted to write to a &mut [MaybeUninit<u8>].

Yes, that would work for the C backend.

What about the Rust backend though? The Rust backend would also need to be adjusted to write to a &mut [MaybeUninit<u8>], right? If so, then a prerequisite for this approach would be to also change the miniz_oxide crate as outlined in #220 (comment). Do you think that we should pursue such changes to the miniz_oxide crate?

miniz_oxide always zeroes the output buffer for safety even when writing to a Vec: https://docs.rs/miniz_oxide/0.7.1/src/miniz_oxide/deflate/mod.rs.html#121-157

I think it's best to just zero the buffer when using miniz_oxide instead of revamping its internals to use MaybeUninit.

Using MaybeUninit requires changing the public API of https://docs.rs/flate2/latest/flate2/struct.Compress.html, right? I see two options:

  • Option1: Add a new pub fn compress_uninit(..., output: &mut [MaybeUninit<u8>], ...)
  • Option2: Change the type of output in the existing pub fn compress

I guess option1 sounds better / less disruptive? Can you please provide feedback on this?


And then let me also mention that we would use MaybeUninit in the C/Rust-agnostic InflateBackend::decompress (and in DeflateBackend::compress). And C would rely on C-side initialization, while Rust would zero-initialize before calling into miniz_oxide. Please shout if this doesn't sound right.

I note that this means that Rust may repeatedly initialize the same bytes over and over again as you point out in #220 (comment). Is this okay?

compress_vec and decompress_vec functions are really just convenience wrappers for using compress/decompress which operates on a slice with Vecs wrapping some unsafe code to use the allocated unused capacity of the vec so not sure if there is strictly a need for them to be there in their current form either. It may be better to leave the output buffer handling to the caller or suggest simply using compress/decompress which writes directly to a slice if they need the maximum flexibility if zero-initialization in compress_vec is too much of an impact.

Whether a new pub fn compress_uninit(..., output: &mut [MaybeUninit<u8>], ...) would have a benefit beyond the current functions is a separate question.

Let me try to summarize the options we have identified so far (focusing on compress_vec but the notes should also carry over to decompress_vec):

  • Fix1: Avoid unsafe.
    • One particular approach using `vec.resize(capacity, 0) has been implemented in #373
    • @Shnatsel points out in #220 (comment) that this may end up re-initializing the same part of memory over and over again. This would effectively change the complexity of filling a vector with N spare capacity from O(N) to O(N x M) where M is the number of compress_vec calls needed to fill N bytes. OTOH, the docs of compress say that it is "writing as much output as possible" which suggests that M should be 1 in practice.
  • Fix2: Switch implementation of compress_vec to MaybeUninit<u8> / Vec::spare_capacity_mut
    • This requires some cascading changes to use MaybeUninit<u8> in functions called by compress_vec (and maybe in functions called by those functions).
      • This is not a big problem for the C backend.
      • For the Rust backend this may require changing the miniz_oxide crate to provide a public API for writing to &mut [MaybeUninit<u8>]. Alternatively, flate2 could initialize the buffer before calling into miniz_oxide (but IIUC this would have similar characteristics and concerns to just doing vec.resize(capacity, 0) - the Fix1 approach)
      • This would also require tweaking the flate2::Compress struct. At the very least, we would need an internal API that can write to &mut [MaybeUninit<u8>] (e.g. the compress_uninit suggested as Option1 in #220 (comment)).
  • Fix3: Remove the compress_vec API altogether

The performance impact of Fix1 and breaking impact of Fix3 depend on how flate2 is used in practice. The impact may not be that great based on the top 10 crates from https://crates.io/crates/flate2/reverse_dependencies which are listed below:

  • syn - no calls to compress_vec nor decompress_vec - see the search query here
  • object - calls decompress_vec on a vector created with Vec::with_capacity (see here)
  • rustix - no hits - see here
  • tonic - no hits
  • zip - no hits
  • png - no hits
  • tower-http - no hits
  • async-compression - no hits
  • activ-http - no hits
  • cxxbridge-macro - no hits

If calls to compress_vec and decompress_vec are uncommon, then I'd rather make them slightly slower through zero-initialization than remove these functions outright.

If calls to compress_vec and decompress_vec are uncommon, then I'd rather make them slightly slower through zero-initialization than remove these functions outright.

Ack. Both Fix1 and Fix3 are easy ways to avoid UB. I can put up a PR for either one. OTOH, I think that I have a slight preference toward Fix3 (removal), because I don't see the benefit of keeping these functions. It seems that making these functions safe (Fix1) will make them somewhat performance-hostile (i.e. they can lead to repeatedly reinitializing a slice of memory, which means that using compress_vec can regress, rather than improve performance compared to compress). Maybe I missed some other benefits of keeping these functions around?

/cc @Byron

Can you please provide your feedback on this issue? Which of the fix options discussed above would you prefer? (e.g. out of the ones in #220 (comment))

It may be worth continuing the community discussion on this issue, but I also wonder who the decision maker here is (i.e. who can review, approve, and merge PRs)? I see that you've merged most of the recent PRs, so I hope you won't mind being CC-ed / summoned into this issue :-).

Byron commented

Thanks for chiming me in - I was too busy in the past days but am glad I join now since all information has been gathered so neatly. It took an hour to catch up on all that was said, with a considerable portion spent on reading Ralf Jung's excellent articles (again, probably, I keep forgetting).

With that said, here is my amateur perspective on uninitialized memory:

  • it's like Option<u8> and hence observing it is forbidden until it becomes Some(v). If it's observed anyway, it's UB.
  • Creating a slice from uninitialized memory is UB because it allows reading. Here it's less about whether it's actually read, but that it can be read, and strange optimizations can happen if this is done. The compiler can also insert spurious reads for optimization, which may cause said memory to be read even though it looks like it will only be written to.

So for me it's also established that the current behaviour of compress_vec needs fixing.

I think it worth spelling out that correctness always trumps performance, so performance considerations shouldn't hold back a simple fix - optimizations, possibly like discussed, can be made later if there is demand.

As a general stance, we will avoid breaking changes as these will ripple through the ecosystem with a lot of churn, so any fix here must be none-breaking.

From the above it seems clear that #373 is a suitable and welcome fix.

Regarding performance

gitoxide is an avid user of flate2. The most performance sensitive code is stream decompression which happens to use decompress() with its own buffer management. I spent some time profiling the code to arrive there, and it appears that those who truly are dependent on the fastest possible implementation wouldn't use (de)compress_vec() anyway. gitoxide settled for reusing allocated buffers.

It's notable that gitoxide still doesn't re-use inflate state which it truly should for even more performance, as this code is called a whole lot, but I digress.

Conclusion

Yes, let's merge the fix and not worry about performance as those with performance concerns will have suitable alternatives. Correctness is more important.

Maybe there is more spots of code that have soundness issues like this? I'd love to see them rooted out. Thanks everyone for your help and engagement, it's much appreciated. I am grateful too as I feel like I became a little more aware and could adjust my mental model towards uninitialized memory. I think gitoxide also has those issues and I will revisit them (i.e. v = Vec::with_capacity(x); v.set_len(x); &mut v).

Any plans for releasing a new version of the crate? (if so, then it would be greatly appraciated as it would help me get approval to import the crate into my project, without having to carry a patch on top of the last released version)

@Byron

it's like Option<u8> and hence observing it is forbidden until it becomes Some(v). If it's observed anyway, it's UB.

This is incorrect, and a common misapprehension. Having uninitialized memory is instantly UB even if you never read from it. I.e. MaybeUninit::new().assume_init() is always UB even if you ignore the value.

The exception to this is MaybeUninit itself and certain aggregates of it (like arrays of MaybeUninit), which are allowed to inhabit the uninit state.

Similarly,

Creating a slice from uninitialized memory is UB because it allows reading

it's UB automatically, regardless of what it allows

Byron commented

it's UB automatically, regardless of what it allows

Uninitialised memory is even meaner than I thought 😅.

I will prepare a new release now.