rust-lang/rust

Indexing with `slice::binary_search_by`'s result should not produce panicking branch

Opened this issue · 8 comments

In this code:

fn binary_search_repro(needle: &str) -> Option<&str> {
    let haystack = &["a", "b"];
    haystack.binary_search(&needle).ok().map(|i| haystack[i])
}

the compiler should be able to use the hint from slice::binary_search_by in order to elide the length check in the index operation (haystack[i]).

Seems the knowledge from this assert_unchecked is lost somehow:

unsafe { hint::assert_unchecked(base < self.len()) };

Godbolt: https://rust.godbolt.org/z/8zqq8vrrv

for some reason I think I should cc @scottmcm and @the8472 on this optimization problem but maybe I should be tagging @dianqk instead.

(I should probably start searching for existing issues and then filing new ones, but...)

Might be considered a duplicate of this: #98258

Just found this codegen test:

pub fn binary_search_index_no_bounds_check(s: &[u8]) -> u8 {
// CHECK-NOT: panic
// CHECK-NOT: slice_index_fail
// CHECK-NOT: panic_bounds_check
if let Ok(idx) = s.binary_search(&b'\\') { s[idx] } else { 42 }
}

Weirdly enough, if I modify my example's haystack to be a slice of u8s, like in this test, the panicking branch does indeed get elided. With a slice of &strs it doesn't.

not addressing the issue, just sidestepping it:

#[unsafe(no_mangle)]
pub fn binary_search_repro(needle: &str) -> Option<&str> {
    let haystack = &["a", "b"];
    haystack.binary_search(&needle).ok().and_then(|i| haystack.get(i)).map(|v| &**v)
}

Yeah, if that has the "ideal" codegen then that sounds a bit like an optimization bug.

The pattern we end up with looks something like this: https://alive2.llvm.org/ce/z/ZGBKEp

This issue is specific to the case where the "haystack" is a small known array.

The pattern we end up with looks something like this: https://alive2.llvm.org/ce/z/ZGBKEp

This issue is specific to the case where the "haystack" is a small known array.

FWIW, this is the real-life place where I encountered this issue:
https://github.com/pulldown-cmark/pulldown-cmark/blob/f4a326d225e79412b5ecabd1c241c851e8160815/pulldown-cmark/src/entities.rs#L2153-L2158

it's not exactly a "small" array, but it is known.

But thanks to your hint, I managed to elide the panicking branch by rewriting the function as such:

pub(crate) fn get_entity(bytes: &[u8]) -> Option<&'static str> {
    let entities = black_box(ENTITIES);
    entities
        .binary_search_by_key(&bytes, |&(key, _value)| key)
        .ok()
        .map(|i| entities[i].1)
}

(also had to make ENTITIES a slice instead of an array)

it's not exactly a "small" array, but it is known.

Oh, I see. I guess the log2(size) needs to be "small", but that of course is also true for quite large arrays.