KillingSpark/zstd-rs

Undefined Behavior: ringbuffer code might produce uninitialized bytes

Manishearth opened this issue · 10 comments

Caught during unsafe code review.

The RingBuffer implementation is pretty decent. However when allocating the internal buffer it uses alloc().

let new_buf = alloc(new_layout);

alloc() produces uninitialized bytes, so we should be really careful to never read them or even produce anything that isn't a raw pointer that refers to them as their type (u8 here). Even though u8 is plain old data and allows all byte sequences, it is still not allowed to be uninit (you need MaybeUninit<u8> for that). See the docs for more info.

Now, most of the code is fine around this. It only ever produces slices to the inhabited portion of the buffer, and head/tail only expand to cover more bytes during a write operation that will ptr::write() a new byte (as opposed to writing to it as an &mut u8, which is UB).

There's one exception: the extend_from_within_unchecked() code:

self.buf
.as_ptr()
.add(self.tail)
.copy_from_nonoverlapping(self.buf.as_ptr().add(self.head + start), after_tail);

It copies the data around (which is safe), but because it is expanding the data, it does have a potential issue where the reserve len ends up spilling past tail and covering uninit data. For example, imagine expanding the ringbuffer xHDDDTxxxxxxx at start=2, len=5, giving xHDddtxxDDTxx where x is uninit, and d/t refer to the old leftover values after the copy. Some of the area of len 5 is still init due to the leftover values, but the rest is uninit.

The easiest solution is to use alloc_zeroed() instead of alloc(), but otherwise the extend functions could specifically be patched up to write zeroes to the new region, or require this be managed as a caller invariant (and then ensure their callers uphold this. I do not think they currently do).

Thanks for the detailed writeup! I'll look this through closely.

Right now I am leaning towards requiring that start+len < bufferlen that should fix the specific issue you raised.

On the other hand allocations shouldn't happen that often so the extra costs of zeroing should honestly not be that big. As an optimization we could also only zero the parts we did not write to after reallocation.

I'll try a few things, benchmark them, and see what fits the best.

Miri should be very good at finding this kind of issues

Wait I am sorry @Manishearth the unsafe functions already have the requirement:

    /// SAFETY:
    /// Needs start + len <= self.len()
    /// And more then len reserved space
    pub unsafe fn extend_from_within_unchecked(&mut self, start: usize, len: usize) {

And the three call sites do check these requirements,

That does effectively prevent the issue you raised right?

It copies the data around (which is safe), but because it is expanding the data, it does have a potential issue where the reserve len ends up spilling past tail and covering uninit data. For example, imagine expanding the ringbuffer xHDDDTxxxxxxx at start=2, len=5, giving xHDddtxxDDTxx where x is uninit, and d/t refer to the old leftover values after the copy. Some of the area of len 5 is still init due to the leftover values, but the rest is uninit.

The call with start=2, len=5 would be illegal as they do not fit the documented restraints.

Ah, I misread that as talking about the buffer capacity.

In that case, I think it would be better to have a more worked out comment about why the caller upholds this invariant

// we know that start_idx <= buf_len and start_idx + offset == buf_len and we reserverd match_length space

since the comment is true at the beginning of the loop but it's trickier to show for sure in future iterations.

During review I had to really stare at this function to be sure the invariants were upheld even though there were comments. Might be good for future unsafe reviewers and people attempting to modify the code.

Fair enough, I will make a more thorough analysis of the call sites and give a more detailed reasoning on why the invariants are upheld

Thank you so much!

If I get a chance I also noted down a bunch of invariants during unsafe review that may be contributable as comments on the ringbuffer code if you would find that useful. Not sure when I'll get time to do this but worth contributing back if I can!

Oh I would be very interested in those comments. If you dont get around to a PR I'd be glad to just get your notes in text or whatever

Opened #45 . Could be better, but that's as much time I can spare for now.

Would you say commit 34ff967 resolves this issue?