microsoft/DiskANN

[BUG] Low recall rate on a custom dataset

igmor opened this issue · 0 comments

Expected Behavior

I've ran benchmarks from DiskANN repo on a custom dataset with 100K vectors of 512 dimensions and got some strange results,
more specifically low recall rate that goes down as L goes up and low QPS.

I would normally expect QPS to go down as dimensions go up but not that dramatically.

Actual Behavior

Here is the results

Loading the cache list into memory....done.
     L   Beamwidth             QPS    Mean Latency    99.9 Latency        Mean IOs         CPU (s)       Recall@10
=============================================================================================
    10           2           70.72         6706.17        25599.00            9.81          668.29           66.52
    20           2           60.99        11898.02        22016.00           18.96          980.30           46.50
    30           2           44.41        23712.32       195452.00           28.11         1261.76           43.39
    40           2           33.59        38685.81       326683.00           37.23         1606.08           43.05
    50           2           26.99        52989.25       394442.00           46.21         1943.90           42.04
   100           2           13.76       124475.75       566027.00           90.94         3552.57           36.63

Example Code

I can share a sample of dataset to run on your side

Dataset Description

Please tell us about the shape and datatype of your data, (e.g. 128 dimensions, 12.3 billion points, floats)

  • Dimensions: 512
  • Number of Points: 99800
  • Data type: float

Error

see above, search results are weird

Your Environment