Lokathor/bytemuck

Suboptimal assembly for Contiguous derive

e00E opened this issue · 9 comments

e00E commented

The derived implementation of Contiguous::into_integer prevents some optimizations from happening. See the following example in which I have commented the resulting assembly as retrieved through cargo-show-asm on my stable-x86_64-unknown-linux-gnu machine.

use bytemuck::Contiguous;

#[derive(Copy, Clone, Contiguous)]
#[repr(u8)]
pub enum E {
    E0,
    E1,
    E2,
}

impl E {
    // movzx eax, byte ptr [rdi]
    pub fn ref_as(&self) -> Option<Self> {
        Self::from_integer(*self as u8)
    }

    // movzx eax, byte ptr [rdi]
    pub fn ref_into(&self) -> Option<Self> {
        Self::from_integer(self.into_integer())
    }

    // mov eax, edi
    pub fn val_as(self) -> Option<Self> {
        Self::from_integer(self as u8)
    }

    // cmp dil, 3
    // mov eax, 3
    // cmovb eax, edi
    pub fn val_into(self) -> Option<Self> {
        Self::from_integer(self.into_integer())
    }
}

The assembly for E::val_into should be the same as for E::val_as.

This could be an issue with the default forbidden-to-override implementation of Contiguous::into_integer or it could be an issue with rustc. As a workaround the derive macro could emit as Int since it knows what the repr type is. If you feel this is a rustc issue then I am happy to report it there.

I'd certainly be happy to accept a PR which improves the generated assembly of any of the proc-macros.

Jarcho commented

The issue is cause by the transmute in into_integer. Casting to an integer rather than using transmute does allow the optimizer to preserve the range from the enum.

Hopefully fixed for Rust 1.71 -- try it out on nightly and let me know if it helped.

e00E commented

Thanks. Will try to remember to test this again when 1.71 releases.

e00E commented

Hopefully fixed for Rust 1.71 -- try it out on nightly and let me know if it helped.

Was not fixed by Rust 1.71. Tried with the stable release today.

Oh, I see the problem. When I read

The issue is cause by the transmute in into_integer. Casting to an integer rather than using transmute does allow the optimizer to preserve the range from the enum.

I heard mem::transmute, not transmute!.

With true transmute they now do the same thing (https://rust.godbolt.org/z/Y6PrYbqaW -- so the same that they get merged), but the transmute_copy(&ManuallyDrop dance doesn't get the extra assumes.

So I guess the way forward here is either

  1. Update the derive to override into_integer to an implementation using a cast rather than using the provided impl, as was mentioned in #175 (comment), or
  2. Convince libs-api to expose a stable API wrapper for https://doc.rust-lang.org/nightly/std/intrinsics/fn.transmute_unchecked.html
e00E commented

Thanks for your analysis. I might look into the derive change.

Jarcho commented

That's my fault since I only reported mem::transmute and not mem::transmute_copy. I'll open an issue for that since it should also get the same treatment.

aside: now that ptr reading is const fn i hope transmute_copy is const fn soon too