facebookresearch/faiss

FAISS GPU Index Returns Inconsistent Results on Consecutive Prints

rangehow opened this issue · 2 comments

Issue Description:
After performing a search using a FAISS GPU index, consecutive prints of the search results (index values) yield inconsistent outputs. The first print shows values clearly outside the index range, while the second print displays correct results.

Steps to Reproduce:

Perform a search using FAISS GPU index (query vector and faiss GPU index should be place at different device like cuda:0 cuda:1).
Immediately print the returned index values twice in succession.

# My index type is IVF256_HNSW32,SQ8
index = faiss.read_index(
    str(path / "{index_name}.faiss".format(index_name=index_name))
)

print(index.ntotal) # 473000
index = faiss.index_cpu_to_gpu(faiss.StandardGpuResources(), 1, index)

# important, vector is on cuda:0, index is on cuda:1
scores, indices = self.index.search(vector, k if filter is None else fetch_k)   
        
torch.cuda.synchronize() # without this, the problem will occur in the **second** query
print(indices)
print(indices)

Observed Behavior:

The first print displays some index values far exceeding the actual index size (e.g., values like 4583111473221640818 appear, while the actual index size is only about 473000).

tensor([[4583111473221640818, 4589232364260973351, 4619801682321543224,
4620444067705072548, 4623237342634618959, 4624239654857833036,
4623615402837892849, 4616921717781585215, 4616387161855799078,
4614467105316646528],
[ 30822, 230074, 128917,
286551, 443909, 443465,
315651, 457551, 247513,
285235],
[ 465657, 465531, 464361,
464334, 464973, 465180,
464550, 464532, 464928,
464937],
[ 20035, 19936, 466271,
466023, 467063, 53705,
63804, 11154, 73523,
54755],
[ 471679, 470846, 470349,
470069, 470741, 470615,
470272, 470174, 471833,
471406]], device='cuda:0')

The second print (immediately following the first) shows correct index values within the expected range.
Environment:

tensor([[ 30133, 29611, 29959, 18119, 29785, 417361, 417196, 417136, 418696,
418711],
[ 30822, 230074, 128917, 286551, 443909, 443465, 315651, 457551, 247513,
285235],
[465657, 465531, 464361, 464334, 464973, 465180, 464550, 464532, 464928,
464937],
[ 20035, 19936, 466271, 466023, 467063, 53705, 63804, 11154, 73523,
54755],
[471679, 470846, 470349, 470069, 470741, 470615, 470272, 470174, 471833,
471406]], device='cuda:0')

Using PyTorch and CUDA
FAISS GPU index
Resolution:
Adding torch.cuda.synchronize() before the first print statements resolves the issue, making both print results consistent and correct.

Hypothesized Cause:
This issue may be related to the asynchronous nature of GPU operations. The first print might occur before the FAISS search operation on the GPU has fully completed, resulting in incorrect or partially computed results being displayed.

hi @rangehow could you provide a complete repro? It’s strange that the vector is in device 0 and the index in device 1 and you can perform the search without errors . For instance:
Is vector a torch tensor or a numpy array? Do you observe the same behaviour if both index and query vector are in the same device - without having to use torch synchronize?
Note that torch.cuda.synchronize() only synchronizes the execution of CUDA operations on a specified device so I am not sure why this would fix things in this scenario.

It is indeed quite strange. At the time, I thought FAISS would internally handle automatic device migration for tensors. I believe this scenario is quite common because, once the database grows larger, having both the database and the LLM on the same device could potentially cause an OOM (Out of Memory) issue. That's why they are placed on two different devices. The vector are Torch tensors.
I'm sorry, but since too much time has passed, I may not be able to spend time writing a minimal code to reproduce this issue. I'll go ahead and close this issue.