larsgottesbueren/gp-ann

Small-scale experiments

Closed this issue · 4 comments

Hi @larsgottesbueren,

Thank you for making the code available - much appreciated!
I have been trying to reproduce the results in the paper, and have a question. I would be very grateful to get some assistance here. First, I wanted to start with the small-scale experiments to see if I can build, if things are running correctly, etc. I wrote a script to download the SIFT and GloVe datasets - attached. I ran it, and then ran small-scale-experiments.py, but obtained different results than shown in Figure 8 in the paper. The csv files are also attached. Should these match, or am I comparing apples to oranges here?

exp_outputs.zip

Best,
Jakub

Hi Jakub,

Thanks for your interest in my work and for trying out the code. These should be apples-to-apples comparisons :) but your results clearly differ from those in the paper, so something is off. Specifically, the version reported in the paper should be routing = KMTR, shard query = BruteForce. I locally reran the experiment for SIFT + GP and got similar results as the paper.

Looking at your script, I noticed that you use the ground truth from tensorflow datasets. I haven't used that before; the source I got the data from didn't have ground truth labels. Instead, I computed them from scratch. If you delete the the sift.ground_truth.fbin file, this will be done automatically.
Sidenote: fbin is a typo on my end, it should be just bin. Fix is pushed.

Can you also share the logs from your run for sift + GP. Specifically, I am interested in the summary of the partitioning call. It should look something like this.

Partition summary:
  Number of blocks: 16
  Edge cut:         406924
  Imbalance:        0.04576
  Feasible:         yes

Finally, here are the CSVs (GP, KM, BKM) used in the paper and a partition (only GP) similar to the one in the paper (what I had on my local machine).
sift.zip
This should help narrow the issue down.

I hope this helps. Please do follow up, whether you resolve this or stumble across more issues.

Thanks,
Lars

Wow - thank you for the unreasonably fast response :)
I tried deleting the ground truth files and allowing it to be generated, and the results then change slightly, but not a lot. (I guess SIFT is a bit special in that the points actually have 8-bit integer coordinates, so many distances might be equal.)
I'm attaching some runs.
It seems that the partitioning produced by GP cuts about 1.7M edges. Does yours cut 0.4M?
In fact, when I upload your GP partition file and run it (i.e. only run_queries_on_all_datasets()), the recall is horrible (as if it were a random partition).

But then, not only GP is different from the paper - KMeans for me does a bit better than in the plot.
So everything is weird...

outputs2.zip

Best,
Jakub

Hmmm, this is very weird indeed. Yes, my partition has cut 0.4M, so clearly something is very wrong. I will dig deeper to get to the bottom of this. Might take a bit longer than last time, though.

Btw, many distances being equal should not have an effect on recall. Everything with distance <= distance to a point with rank k counts as recalled. But there can be fluctuation from non-determinism in the graph partitioning and HNSW building steps.

Closing this issue, due to following up and resolving via email :)