miri: improve current situation
phip1611 opened this issue · 4 comments
In #128, I added basic miri support. Due to the following issues, miri can't embrace all it's funcitionality:
- Each test that uses the
cast_tag(transitively) fails, as miri can't guarantee that the memory is valid, unfortunately - when something like
struct_as_bytes() ... castis done, there are alignment issues as the returned Vec is of course only aligned to a one byte boundary... so far, in tests this never was a problem, luckily. At least on x86.
I'm 99.9% sure that the current crate is memory-safe. Unit tests, integration tests, at least a partial miri coverage. However, having miri on your side is a benefit, of course.
Update: This might not be true. When it comes to UB, one has to think about the "Rust abstract machine", not what the hardware in the end does.
I created a minimal reproducer for the miri issue. cast_tag causes the problem.
I also tried to solve it by giving all tags a 'a lifetime. Didn't work
use std::mem::size_of;
use std::ops::Deref;
use ptr_meta::Pointee;
#[repr(C, align(8))]
struct Align8<T>(T);
impl<T> Deref for Align8<T> {
type Target = T;
fn deref(&self) -> &Self::Target {
&self.0
}
}
#[derive(Debug, PartialEq, Eq)]
#[repr(u32)]
enum TagType {
Foo = 1,
Dst = 2,
}
#[repr(C)]
struct BaseTag {
typ: TagType,
size: u32,
}
// Tag type: 0x1
#[repr(C)]
struct FooTag {
base: BaseTag,
val: u64,
}
// Tag type: 0x2
#[derive(ptr_meta::Pointee)]
#[repr(C)]
struct DstTag {
base: BaseTag,
// embedded string slice
name: [u8],
}
impl DstTag {
fn name(&self) -> &str {
core::str::from_utf8(&self.name).unwrap()
}
}
fn cast_tag<'a, T: ?Sized + Pointee<Metadata = usize> + 'a>(tag: &'a BaseTag) -> &'a T {
let ptr = core::ptr::addr_of!(*tag);
unsafe { &*ptr_meta::from_raw_parts(ptr.cast(), tag.size as usize - size_of::<BaseTag>()) }
}
mod tests {
use super::*;
#[test]
fn test_parse_tag() {
let bytes = Align8([
// <begin of tag foo>
TagType::Foo as u32 as u8, // tag foo: type
0x0,
0x0,
0x0,
16, // tag foo: size
0x0,
0x0,
0x0,
0x37, // tag foo: val
0x13,
0x0,
0x0,
0x0,
0x0,
0x0,
0x0,
// <end of tag foo>
// <begin of tag dst>
TagType::Dst as u32 as u8, // tag dst: type
0x0,
0x0,
0x0,
16, // tag dst: size
0x0,
0x0,
0x0,
b'h',
b'e',
b'l',
b'l',
b'o',
b' ',
b'!',
b'\0', // tag foo: name
// <end of tag dst>
]);
// Parsing of Tag Foo
let base_tag_ptr = bytes.as_ptr().cast::<BaseTag>();
let base_tag_ref = unsafe { &*base_tag_ptr };
assert_eq!(base_tag_ref.size, 16);
assert_eq!(base_tag_ref.typ, TagType::Foo);
let tag_foo = unsafe { &*base_tag_ptr.cast::<FooTag>() };
assert_eq!(tag_foo.val, 0x1337);
// Parsing of Tag Dst
let base_tag_ptr = unsafe {
bytes
.as_ptr()
.add(tag_foo.base.size as usize)
.cast::<BaseTag>()
};
let base_tag_ref = unsafe { &*base_tag_ptr };
assert_eq!(base_tag_ref.typ, TagType::Dst);
assert_eq!(base_tag_ref.size, 16);
let tag_dst = cast_tag::<DstTag>(base_tag_ref);
assert_eq!(tag_dst.name(), "hello !\0");
}
}This patch solves the miri issues:
0a1
> use ptr_meta::Pointee;
3d3
< use ptr_meta::Pointee;
51,53c51,60
< fn cast_tag<'a, T: ?Sized + Pointee<Metadata = usize> + 'a>(tag: &'a BaseTag) -> &'a T {
< let ptr = core::ptr::addr_of!(*tag);
< unsafe { &*ptr_meta::from_raw_parts(ptr.cast(), tag.size as usize - size_of::<BaseTag>()) }
---
> fn cast_tag<'a, T: ?Sized + Pointee<Metadata = usize> + 'a>(
> backing_mem: &'a [u8],
> ) -> &'a T {
> let base_tag = unsafe { backing_mem.as_ptr().cast::<BaseTag>().as_ref() }.unwrap();
> unsafe {
> &*ptr_meta::from_raw_parts(
> backing_mem.as_ptr().cast(),
> base_tag.size as usize - size_of::<BaseTag>(),
> )
> }
97c104
< // <end of tag dst>
---
> // <end of tag dst>
118c125,127
< let tag_dst = cast_tag::<DstTag>(base_tag_ref);
---
> let offset = base_tag_ptr.cast::<u8>() as usize - bytes.as_ptr() as usize;
> let bytes_slices = &bytes[offset..];
> let tag_dst = cast_tag::<DstTag>(bytes_slices);
A lively discussion on Reddit (thanks everyone!) pointed out that my assumption was wrong. My assumption was that one can access memory outside the scope of a function if the caller level ensures that memory is valid. However, rustc only looks at functions and not the whole call tree to improve performance during compilation.
Here are the options I have:
- go with approach B (see patch above)
- make
cast_taga macro - Split the type
TagintoCommonHeaderandGenericTag, where the latter is a DST with a[u8]field after the common header. This ensures that every cast only operates on memory that the function also has access to by its inputs.