hora-search/hora

Crash using cosine similarity when calling search

Opened this issue · 4 comments

When indexing a small number of vectors I am getting this error when specifying cosine_similarity (euclidean works fine for instance):

thread 'hora_test' panicked at 'called `Option::unwrap()` on a `None` value', /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/core/neighbor.rs:32:54
stack backtrace:
   0: rust_begin_unwind
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/panicking.rs:143:14
   2: core::panicking::panic
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/panicking.rs:48:5
   3: core::option::Option<T>::unwrap
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/option.rs:752:21
   4: <hora::core::neighbor::Neighbor<E,T> as core::cmp::Ord>::cmp
             at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/core/neighbor.rs:32:9
   5: <hora::core::neighbor::Neighbor<E,T> as core::cmp::PartialOrd>::partial_cmp
             at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/core/neighbor.rs:38:14
   6: core::cmp::PartialOrd::le
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/cmp.rs:1129:19
   7: core::cmp::impls::<impl core::cmp::PartialOrd<&B> for &A>::le
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/cmp.rs:1505:13
   8: alloc::collections::binary_heap::BinaryHeap<T>::sift_up
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/collections/binary_heap.rs:562:16
   9: alloc::collections::binary_heap::BinaryHeap<T>::push
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/collections/binary_heap.rs:496:18
  10: hora::index::hnsw_idx::HNSWIndex<E,T>::search_layer::{{closure}}
             at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/index/hnsw_idx.rs:363:25
  11: <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::for_each
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/slice/iter/macros.rs:211:21
  12: hora::index::hnsw_idx::HNSWIndex<E,T>::search_layer
             at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/index/hnsw_idx.rs:353:13
  13: hora::index::hnsw_idx::HNSWIndex<E,T>::search_knn
             at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/index/hnsw_idx.rs:433:25
  14: <hora::index::hnsw_idx::HNSWIndex<E,T> as hora::core::ann_index::ANNIndex<E,T>>::node_search_k
             at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/index/hnsw_idx.rs:615:55
  15: hora::core::ann_index::ANNIndex::search
             at /Users/sam/.cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/core/ann_index.rs:93:9
  16: hora_c::hora_test
             at ./src/lib.rs:192:13
  17: hora_c::hora_test::{{closure}}
             at ./src/lib.rs:168:1
  18: core::ops::function::FnOnce::call_once
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ops/function.rs:227:5
  19: core::ops::function::FnOnce::call_once
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


failures:
    hora_test

Same. Do you have any clue?

I ended up switching to brute force search for my use case so haven't revisited it.

https://github.com/spullara/bfes

I had the same issue and it's caused by this commit fca4516 which negates the output of the dot product resulting in calling sqrt() of a negative number when calculating the cosine distance. I'm not sure why the change was made but reverting it fixed CosineSimilarity for me though it may break other things. You can see the change I made here: rangsikitpho@0836f2c

Thanks @rangsikitpho !
Removing negation in line 28 and 32 fixes this and top distance pairs look something like (0, -0.060707208), (3, -0.26921165), (1, -0.6891982), (2, -0.9331413)].
I just convert the distance.abs() to display score.