zerocopy: deriving the total size in bytes including `Ref` fields
Closed this issue · 5 comments
Would it make sense to include in ZeroCopy
a function that returns the total size in bytes of the struct, taking into account the length of Ref
s? One use case for this would be to iterate over a buffer storing a sequence of T
s.
Would you mind providing a code example to better indicate what it is that you want to do?
Sure. Let's say I have the following type:
#[derive(ZeroCopy)]
#[repr(C)]
struct TestValue {
timestamp: u32,
tag: Ref<str>,
}
And I'm building a buffer with a sequence of these values:
let mut buf = OwnedBuf::new();
for tag in ["host:a", "host:b", "host:c"] {
let reference = buf.store_uninit();
let tag = buf.store_unsized(tag);
buf.load_uninit_mut(reference).write(&TestValue {
timestamp: 42
tag,
});
}
Now I want to iterate over the stored TestValue
s:
struct BufIterator<'a, T> {
buf: &'a Buf,
offset: usize,
_marker: PhantomData<T>,
}
impl<'a, T> BufIterator<'a, T> {
fn new(buf: &'a [u8]) -> Self {
Self {
buf: Buf::new(buf),
offset: 0,
_marker: PhantomData,
}
}
}
impl<'a, T: ZeroCopy + 'a> Iterator for BufIterator<'a, T> {
type Item = (&'a T, &'a Buf);
fn next(&mut self) -> Option<Self::Item> {
if self.offset >= self.buf.len() {
return None;
}
let value = self.buf.load_at::<T>(self.offset).expect("valid object");
let size_in_bytes = value.size_in_bytes();
self.offset += size_in_bytes + padding_to(size_in_bytes, align_of::<T>());
Some((value, self.buf))
}
}
This assumes there's a valid T
at offset 0. To go to the second T
I need to skip over T
and its components. size_in_bytes
for TestValue
would be implemented like so:
impl TestValue {
fn size_in_bytes(&self) -> usize {
size_of::<Self>() + self.tag.len()
}
}
Thanks!
I just want to check in with you that this might unintentionally be storing more data than you need.
This:
#[derive(ZeroCopy)]
#[repr(C)]
struct TestValue {
timestamp: u32,
tag: Ref<str>,
}
Has a layout of:
#[repr(C)]
struct TestValue {
timestamp: u32,
tag_offset: u32,
tag_len: u32,
}
I think the tag_offset
would be superfluous for the structure you're looking for, which I think is this:
#[derive(ZeroCopy)]
#[repr(C)]
struct TestValue {
timestamp: u32,
tag_len: u32,
}
And where you store it, tag_len
should indicate the length of the string immediately following the struct.
With that, construction could look like this:
let mut buf = OwnedBuf::new();
for tag in ["host:a", "host:b", "host:c"] {
let reference = buf.store_uninit();
let tag = buf.store_unsized(tag);
buf.load_uninit_mut(reference).write(&TestValue {
timestamp: 42
// Access the packed `u32` with `metadata()` instead of converting it to a length through `len()`.
tag_len: tag.metadata(),
});
}
If that's what you want, I'm unsure what the best abstraction would be.
Thanks, that's a neat optimization. It seems like it is generally useful and could be expressed in the API. Something like:
#[derive(ZeroCopy)]
#[repr(C)]
struct TestValue {
timestamp: u32,
tag: RawRef<str>,
value: RawRef<[u32]>,
}
Where a RawRef
needs an offset and a buffer to be loaded. I can also see a macro deriving helpers to calculate offsets from the struct's base offset.
Something like RawRef
would allow the proposed size_in_bytes()
to be derived and it also reads better, since it associates type information with the field (with tag_len: u32
it isn't clear what the type of the underlying field is).
(Btw, my use case for zerocopy
might be unorthodox. I have a key-value database, where each value is a timeseries. The in-memory components of the database store the values associated with a key in a collection of fixed-size, linked chunks, allocated from some custom slab allocator. I'm requiring individual values to be ZeroCopy
for better memory management and I'm storing them within the chunks.)
I'm not sure what would be added in musli for this. But if we start considering the embedded size of values as the size of structs it would go against a number of assumptions with existing primitives which I don't think would fit well. So I'll close for now.