How to return more points?
Dronablo opened this issue · 2 comments
Dronablo commented
Hi!
Let's start with the code:
import falconn
import numpy as np
params_cp = falconn.LSHConstructionParameters()
params_cp.dimension = 300
params_cp.lsh_family = falconn.LSHFamily.CrossPolytope
params_cp.distance_function = falconn.DistanceFunction.EuclideanSquared
params_cp.l = 50
params_cp.num_rotations = 2
params_cp.seed = 5721840
params_cp.num_setup_threads = 1
params_cp.storage_hash_table = falconn.StorageHashTable.BitPackedFlatHashTable
# 2**17 <= N_SAMPLES
falconn.compute_number_of_hash_functions(17, params_cp)
table = falconn.LSHIndex(params_cp)
qdata = np.zeros((1000, 300), dtype=np.float32)
for en, i in enumerate(range(0, 1000, 100)):
qdata[i:i+100][:,en*30:(en+1)*30] = 1
# qdata[0]
# qdata[200]
table.setup(qdata)
centroid_index = table.construct_query_object()
centroid_index.set_num_probes(50)
print(len(centroid_index.find_k_nearest_neighbors(qdata[0], k=200)))
print(len(centroid_index.find_near_neighbors(qdata[0], threshold=10e10)))
So, we have 1000 points in 300-dimensional space, grouped in 10 extremely dense areas (in the example above - grouped into dots). When I make a query to get k nearest neigbours, I would never get more points, than in nearest dense area. So, when I ask for k<100 it's ok, but if I want k=200, or k=1000 falconn returns only 100 points.
Is there any way to configure params in such a way, so above script would print
200
1000
?
Thank you!
ludwigschmidt commented
Have you tried increasing the number of probes?
Dronablo commented
Sure!
Here's little bit simpler code, goal is to get more than 3 results:
import falconn
import numpy as np
params_cp = falconn.LSHConstructionParameters()
params_cp.dimension = 6
params_cp.lsh_family = falconn.LSHFamily.CrossPolytope
params_cp.distance_function = falconn.DistanceFunction.EuclideanSquared
params_cp.l = 50
params_cp.num_rotations = 3
params_cp.seed = 5721840
params_cp.num_setup_threads = 1
params_cp.storage_hash_table = falconn.StorageHashTable.BitPackedFlatHashTable
# 2**17 <= N_SAMPLES
falconn.compute_number_of_hash_functions(17, params_cp)
table = falconn.LSHIndex(params_cp)
qdata = np.array([[1,1,1,1,0,0],
[1,2,1,2,0,0],
[2,1,2,1,0,0],
[0,0,0,1,1,1],
[0,0,0,1,2,2],
], dtype=np.float32)
table.setup(qdata)
centroid_index = table.construct_query_object()
centroid_index.set_num_probes(5000)
print(len(centroid_index.find_k_nearest_neighbors(qdata[0], k=200)))
print(len(centroid_index.find_near_neighbors(qdata[0], threshold=10e10)))
Also tried to play with storage_hash_table, num_rotations, l, k - nothing changed.