Unofficial python wrapper to the nanoflann library [1] with sklearn interface and additional multithreaded capabilities.
nanoflann implementation of k-d tree provides one of the best performance for many k-nearest neighbour problems [2].
It is a good choice for exact k-NN, radius searches in low dimensional spaces.
pip install git+https://github.com/u1234x1234/pynanoflann.git@0.0.8
import numpy as np
import pynanoflann
index = np.random.uniform(0, 100, (100, 4))
queries = np.random.uniform(0, 100, (10, 4))
nn = pynanoflann.KDTree(n_neighbors=5, metric='L1', radius=100)
nn.fit(index)
# Get k-nearest neighbors
distances, indices = nn.kneighbors(queries)
# Radius search
distances, indices = nn.radius_neighbors(queries)
If you need to save the model, there are two options:
- Save only the built index. In this case, data points are NOT stored in file. Very efficient, but inconvenient in some cases.
kdtree.fit(X)
kdtree.save_index('index.bin')
prebuilt_kdtree = pynanoflann.KDTree()
# Must use the same data on which the index was built.
prebuilt_kdtree.fit(X, 'index.bin')
Please refer to the detailed example
- Pickle the whole model with data points. Less efficient, but convenient.
kdtree.fit(X)
with open('kdtree.pkl', 'wb') as out_file:
pickle.dump(kdtree, out_file)
with open('kdtree.pkl', 'rb') as in_file:
unpickled_kdtree = pickle.load(in_file)
Please refer to the detailed example
Implemented on C++ side to reduce python's multiprocessing
overhead.
-
Query parallelization: Example
-
Simultaneous indexing+querying parallelization: Example, Discussion
Generally it much faster than brute force or cython implementation of k-d tree in sklearn
To run benchmark:
python benchmark.py
- Blanco, Jose Luis and Rai, Pranjal Kumar, 2014, nanoflann: a C++ header-only fork of FLANN, a library for Nearest Neighbor ({NN}) with KD-trees.
- Vermeulen, J.L., Hillebrand, A. and Geraerts, R., 2017. A comparative study of k‐nearest neighbour techniques in crowd simulation.
- OpenFLANN Performance comparison between PCL, nanoflann, picoflann