Slice (bulk) volatile write/read methods on VolBlock
nicopap opened this issue · 3 comments
Currently, the only way to write a large amount of data into volatile memory is through iterating through individual VolAddress
and writing a single T
to memory one after another.
I've not inspected myself the generated code, but it seems reasonable to assume this results in slower than optimal code (it seems fast, so maybe it actually is optimizing it!) I've read your amazing writeup on how to hand optimize load/store ASM code, and it seems reasonable that a write_slice
and read_slice
methods could make use of better intrinsics such as volatile_copy_memory
and volatile_set_memory
(Using a pointer cast might just work??)
impl VolBlock {
pub fn write_slice_at_offset(self, offset: usize, slice: &[T]) {
//...
}
pub fn read_slice_at_offset<const L: usize>(self, offset: usize) -> [T; L] {
//...
}
pub fn write_slice(self, slice: &[T]) {
//...
}
pub fn read_slice<const L: usize>(self) -> [T; L] {
//...
}
}
I've written already such methods (though without optimizations, just the API) and I'd be happy to contribute it back if this is wanted.
The problem with the volatile_copy_memory
and volatile_set_memory
intrinsics is that they're still unstable, and likely to be unstable for a long time. That's why I made the design as simple as possible to kinda work around that intrinsic being missing.
I think some of this sort of thing is in the VolSlice type, which is behind a feature gate right now, but it probably would be fine to stabilize into the main library. And the VolBlock type will deref into the VolSlice type, once things are ready.
Oh I see now. I completely missed the VolRegion
struct. That looks exactly like what I'm describing in this issue. Amazing! I guess it's still possible to featuregate the optimized version and keep the sequential read/slice in the stable implementation.
Looking at the volatile crate, I see that's exactly what they came up with.
Though I wonder if casting to a VolAddress<[T; N]>
then writing/reading the array to/from memory compiles down into volatile_copy/set_memory
. Which would be an alright stable alternative (although requires specifying the exact length)
I think for careful optimization purposes, a little more context might be necessary.