some problem about fastscan
Tickdack opened this issue · 9 comments
Summary
i make three index with faiss Cpp to find the most accurate index, to be as quickly as possible, i use python to test the index accuracy, I use flat as the baseline and ivfpq, fastscan as the be tested, but i find the fastscan accuracy is very very low
Platform
Running on:
- [*] CPU
- GPU
Interface:
- [*] C++
- [*] Python
Reproduction instructions
Here is my test code:
# baseline res use flat
b_idx = read_idx(baseline_idx_path)
D, I = b_idx.search(query_vecs, recall_num)
b_rets_ids = I.tolist()
# generate flat res id-map
baseline_map = {}
for i in range(len(b_rets_ids)):
baseline_map[i] = {}
for ret_id in b_rets_ids[i]:
baseline_map[i][ret_id] = 1
# use ivfpq & scann search and print acc
print("Top{} recall acc rate:".format(recall_num))
for idx_path in idx_paths:
m_idx = read_idx(idx_path)
start = time.time()
m_idx.nprobe = 20
D, I = m_idx.search(query_vecs, recall_num)
end = time.time()
print("time: {}".format(end -start))
m_rets_ids = I.tolist()
recall_contain = 0
sum_recall = len(I) * len(I[0])
for i in range(len(m_rets_ids)):
for m_ret_id in m_rets_ids[i]:
if m_ret_id in baseline_map[i]:
recall_contain += 1
print(idx_path + ':' + str(float(recall_contain) / sum_recall * 100) + '%')
here is my test ret:
Top10000 recall acc rate:
time: 0.14550495147705078
384_IVF20480_PQ96x8_IP.index:79.19808510638298%
time: 0.01808476448059082
384_IVF20480_PQ96x4fs_IP.index:0.7340425531914894%
in practice, ivfpqfastscan must be used with refining
in practice, ivfpqfastscan must be used with refining
thank you very much
note also that Fast scan in this example uses 2x less memory (96x4 bits vs. 96x8 bits) so for a fair comparison, the fast-scan version should be built with PQ192x4fs
May I ask if you are willing to share the adjusted results and whether fastscan will perform better?
note also that Fast scan in this example uses 2x less memory (96x4 bits vs. 96x8 bits) so for a fair comparison, the fast-scan version should be built with PQ192x4fs
thank you! with your suggestions, I have improved its performance a lot. How can I further improve its accuracy?
Top10000 recall acc rate:
time: 0.0404963493347168
384_IVF20480_PQ96x8_IP.index:65.33666666666666%
time: 0.09140467643737793
384_IVF30720_PQ192X4fs_IP.index:65.2388888888889%
May I ask if you are willing to share the adjusted results and whether fastscan will perform better?
yes, you can speak chinese, my folk
note also that Fast scan in this example uses 2x less memory (96x4 bits vs. 96x8 bits) so for a fair comparison, the fast-scan version should be built with PQ192x4fs
Besides performance, I have another question, when i use fastscan, i find it return some repeating ids, and some time i find it returns -1 as result, i have adjusted some parameters, as nprobe
and implem
, but it not work, can you give me some suggestions? thank you!
-1 is possible, see https://github.com/facebookresearch/faiss/wiki/FAQ#what-does-it-mean-when-a-search-returns--1-ids
repeating ids should not happen. If you encounter them, please fill a bug report.
-1 is possible, see https://github.com/facebookresearch/faiss/wiki/FAQ#what-does-it-mean-when-a-search-returns--1-ids repeating ids should not happen. If you encounter them, please fill a bug report.
thank you, sir