/horapy

🐍 Python bidding for the Hora Approximate Nearest Neighbor Search Algorithm library

Primary LanguagePythonApache License 2.0Apache-2.0

horapy

[Homepage] [Document] [Examples] [Hora]

Python binding for the Hora Approximate Nearest Neighbor Search

Key Features

  • Performant ⚡️

    • SIMD-Accelerated (packed_simd)
    • Stable algorithm implementation
    • Multiple threads design
  • Multiple Indexes Support 🚀

    • Hierarchical Navigable Small World Graph Index(HNSWIndex) (detail)
    • Satellite System Graph (SSGIndex) (detail)
    • Product Quantization Inverted File(PQIVFIndex) (detail)
    • Random Projection Tree(RPTIndex) (LSH, WIP)
    • BruteForce (BruteForceIndex) (naive implementation with SIMD)
  • Portable 💼

    • Support no_std (WIP, partial)
    • Support Windows, Linux and OS X
    • Support IOS and Android (WIP)
    • No heavy dependency, such as BLAS
  • Reliability 🔒

    • Rust compiler secure all code
    • Memory managed by Rust
    • Broad testing coverage
  • Multiple Distances Support 🧮

    • Dot Product Distance
      • equation
    • Euclidean Distance
      • equation
    • Manhattan Distance
      • equation
    • Cosine Similarity
      • equation
  • Productive

    • Well documented
    • Elegant and simple API, easy to learn

Benchmark

by aws t2.medium (CPU: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz) more information

Installation

pip install horapy

Example

import numpy as np
from horapy import HNSWIndex

dimension = 50
n = 1000

# init index instance
index = HNSWIndex(dimension, "usize")

samples = np.float32(np.random.rand(n, dimension))
for i in range(0, len(samples)):
    # add node
    index.add(np.float32(samples[i]), i)

index.build("euclidean")  # build index

target = np.random.randint(0, n)
# 410 in Hora ANNIndex <HNSWIndexUsize> (dimension: 50, dtype: usize, max_item: 1000000, n_neigh: 32, n_neigh0: 64, ef_build: 20, ef_search: 500, has_deletion: False)
# has neighbors: [410, 736, 65, 36, 631, 83, 111, 254, 990, 161]
print("{} in {} \nhas neighbors: {}".format(
    target, index, index.search(samples[target], 10)))  # search

License

The entire repo is under Apache License.