Suboptimal assembly for Contiguous derive
e00E opened this issue · 9 comments
The derived implementation of Contiguous::into_integer
prevents some optimizations from happening. See the following example in which I have commented the resulting assembly as retrieved through cargo-show-asm on my stable-x86_64-unknown-linux-gnu machine.
use bytemuck::Contiguous;
#[derive(Copy, Clone, Contiguous)]
#[repr(u8)]
pub enum E {
E0,
E1,
E2,
}
impl E {
// movzx eax, byte ptr [rdi]
pub fn ref_as(&self) -> Option<Self> {
Self::from_integer(*self as u8)
}
// movzx eax, byte ptr [rdi]
pub fn ref_into(&self) -> Option<Self> {
Self::from_integer(self.into_integer())
}
// mov eax, edi
pub fn val_as(self) -> Option<Self> {
Self::from_integer(self as u8)
}
// cmp dil, 3
// mov eax, 3
// cmovb eax, edi
pub fn val_into(self) -> Option<Self> {
Self::from_integer(self.into_integer())
}
}
The assembly for E::val_into
should be the same as for E::val_as
.
This could be an issue with the default forbidden-to-override implementation of Contiguous::into_integer
or it could be an issue with rustc. As a workaround the derive macro could emit as Int
since it knows what the repr type is. If you feel this is a rustc issue then I am happy to report it there.
I'd certainly be happy to accept a PR which improves the generated assembly of any of the proc-macros.
The issue is cause by the transmute in into_integer
. Casting to an integer rather than using transmute does allow the optimizer to preserve the range from the enum.
Hopefully fixed for Rust 1.71 -- try it out on nightly and let me know if it helped.
Thanks. Will try to remember to test this again when 1.71 releases.
Hopefully fixed for Rust 1.71 -- try it out on nightly and let me know if it helped.
Was not fixed by Rust 1.71. Tried with the stable release today.
Oh, I see the problem. When I read
The issue is cause by the transmute in
into_integer
. Casting to an integer rather than using transmute does allow the optimizer to preserve the range from the enum.
I heard mem::transmute
, not transmute!
.
With true transmute
they now do the same thing (https://rust.godbolt.org/z/Y6PrYbqaW -- so the same that they get merged), but the transmute_copy(&ManuallyDrop
dance doesn't get the extra assume
s.
So I guess the way forward here is either
- Update the derive to override
into_integer
to an implementation using a cast rather than using the provided impl, as was mentioned in #175 (comment), or - Convince libs-api to expose a stable API wrapper for https://doc.rust-lang.org/nightly/std/intrinsics/fn.transmute_unchecked.html
Thanks for your analysis. I might look into the derive change.
That's my fault since I only reported mem::transmute
and not mem::transmute_copy
. I'll open an issue for that since it should also get the same treatment.
aside: now that ptr reading is const fn i hope transmute_copy is const fn soon too