[BUG] Low recall rate on a custom dataset
igmor opened this issue · 0 comments
igmor commented
Expected Behavior
I've ran benchmarks from DiskANN repo on a custom dataset with 100K vectors of 512 dimensions and got some strange results,
more specifically low recall rate that goes down as L goes up and low QPS.
I would normally expect QPS to go down as dimensions go up but not that dramatically.
Actual Behavior
Here is the results
Loading the cache list into memory....done.
L Beamwidth QPS Mean Latency 99.9 Latency Mean IOs CPU (s) Recall@10
=============================================================================================
10 2 70.72 6706.17 25599.00 9.81 668.29 66.52
20 2 60.99 11898.02 22016.00 18.96 980.30 46.50
30 2 44.41 23712.32 195452.00 28.11 1261.76 43.39
40 2 33.59 38685.81 326683.00 37.23 1606.08 43.05
50 2 26.99 52989.25 394442.00 46.21 1943.90 42.04
100 2 13.76 124475.75 566027.00 90.94 3552.57 36.63
Example Code
I can share a sample of dataset to run on your side
Dataset Description
Please tell us about the shape and datatype of your data, (e.g. 128 dimensions, 12.3 billion points, floats)
- Dimensions: 512
- Number of Points: 99800
- Data type: float
Error
see above, search results are weird
Your Environment
- Ubuntu 20.04.1
- DiskANN version (or commit built from)