Lokathor/bytemuck

use #[inline(always)] instead of #[inline]

Opened this issue · 3 comments

there are multiple locations where #[inline] is used in bytemuck and i doubt performance benefit of using #[inline(always)] may not be meaningful in many cases but for some latency sensitive cases, using #[inline(always)] may come in handy.

I've ran benchmarks on my project for cast and cast_ref and had about 30% latency benefit when using #[inline(always)]. If needed, I'll reproduce the benchmark code!

I would be interested in such benchmarks, and I'm sure others would as well.

For functions as small as the bytemuck ones, normal inline should be enough in an optimized build.

@Lokathor Sure, Sure! I'll clean up the benchmark code and add the link here.

In the meantime, when you mean "optimized build", are you also setting lto and codegen_units or does that simply mean setting "opt-level "?

Without looking into it deeply i would have expected opt-level of 3 (the default for the Release profile) to be sufficient.

However I'm happy to have clear examples of cases where it doesn't happen. Primary because that can probably be shown to a rustc dev and they might be able to fix the compiler itself to help not only bytemuck but possibly similar examples of missed small inlines in other crates.