facebookresearch/faiss

some problem about fastscan

Tickdack opened this issue · 9 comments

Summary

i make three index with faiss Cpp to find the most accurate index, to be as quickly as possible, i use python to test the index accuracy, I use flat as the baseline and ivfpq, fastscan as the be tested, but i find the fastscan accuracy is very very low

Platform

Running on:

  • [*] CPU
  • GPU

Interface:

  • [*] C++
  • [*] Python

Reproduction instructions

Here is my test code:

    #  baseline res use flat
    b_idx = read_idx(baseline_idx_path)
    D, I = b_idx.search(query_vecs, recall_num)
    b_rets_ids = I.tolist()

    # generate flat res id-map
    baseline_map = {}
    for i in range(len(b_rets_ids)):
        baseline_map[i] = {}
        for ret_id in b_rets_ids[i]:
            baseline_map[i][ret_id] = 1

    # use ivfpq & scann search and print acc
    print("Top{} recall acc rate:".format(recall_num))
    for idx_path in idx_paths:
        m_idx = read_idx(idx_path)
        start = time.time()
        m_idx.nprobe = 20
        D, I = m_idx.search(query_vecs, recall_num)
        end = time.time()
        print("time: {}".format(end -start))
        m_rets_ids = I.tolist()

        recall_contain = 0
        sum_recall = len(I) * len(I[0])
        for i in range(len(m_rets_ids)):
            for m_ret_id in m_rets_ids[i]:
                if m_ret_id in baseline_map[i]:
                    recall_contain += 1
        
        print(idx_path + ':' + str(float(recall_contain) / sum_recall * 100) + '%')

here is my test ret:

Top10000 recall acc rate:
time: 0.14550495147705078
384_IVF20480_PQ96x8_IP.index:79.19808510638298%
time: 0.01808476448059082
384_IVF20480_PQ96x4fs_IP.index:0.7340425531914894%

in practice, ivfpqfastscan must be used with refining

in practice, ivfpqfastscan must be used with refining

thank you very much

note also that Fast scan in this example uses 2x less memory (96x4 bits vs. 96x8 bits) so for a fair comparison, the fast-scan version should be built with PQ192x4fs

May I ask if you are willing to share the adjusted results and whether fastscan will perform better?

note also that Fast scan in this example uses 2x less memory (96x4 bits vs. 96x8 bits) so for a fair comparison, the fast-scan version should be built with PQ192x4fs

thank you! with your suggestions, I have improved its performance a lot. How can I further improve its accuracy?

Top10000 recall acc rate:
time: 0.0404963493347168
384_IVF20480_PQ96x8_IP.index:65.33666666666666%
time: 0.09140467643737793
384_IVF30720_PQ192X4fs_IP.index:65.2388888888889%

May I ask if you are willing to share the adjusted results and whether fastscan will perform better?

yes, you can speak chinese, my folk

note also that Fast scan in this example uses 2x less memory (96x4 bits vs. 96x8 bits) so for a fair comparison, the fast-scan version should be built with PQ192x4fs

Besides performance, I have another question, when i use fastscan, i find it return some repeating ids, and some time i find it returns -1 as result, i have adjusted some parameters, as nprobe and implem, but it not work, can you give me some suggestions? thank you!

-1 is possible, see https://github.com/facebookresearch/faiss/wiki/FAQ#what-does-it-mean-when-a-search-returns--1-ids
repeating ids should not happen. If you encounter them, please fill a bug report.

-1 is possible, see https://github.com/facebookresearch/faiss/wiki/FAQ#what-does-it-mean-when-a-search-returns--1-ids repeating ids should not happen. If you encounter them, please fill a bug report.

thank you, sir