Segmentation fault

Question

Segmentation fault

godkillok opened this issue 6 years ago · 10 comments

godkillok commented 6 years ago

Segmentation fault

Running on:

[v] CPU

Interface:

[ v] Python
training_vectors.shape is (2357720, 100). Training is done, but when go to search< index.search(training_vectors[0:10000], 100) > , it always report "Segmentation fault".
the code as follow:

(num, d) = training_vectors.shape
t1 = time.time()

nlist = max(5,int(num/500))
normalize_L2(training_vectors)
quantizer=faiss.IndexFlatIP(d)

index = faiss.IndexIVFFlat(quantizer, d, nlist, faiss.METRIC_INNER_PRODUCT)
index.train(training_vectors)
index.nprobe =max(1,int(nlist*0.6)) # default nprobe is 1, try a few more
index.add(training_vectors)
t2 = time.time()
logging.info('{} times is {}'.format('add and train', t2 - t1))

t1=time.time()

score, sim_id=index.search(training_vectors[0:10000], 100) # this line goes wrong

t2=time.time()

Answer 1 · 2018-07-23T05:57:23.000Z

I don't see an obvious reason why this could go wrong.
Does the segfault still occur when searching 1000 vectors instead of 100000?

Answer 2 · 2018-07-30T11:46:27.000Z

I also get a segmentation fault (also using cpu).

This happens only if the faiss index is a member of an object, for instance here:

import faiss
import numpy as np

class Products:
    def __init__(self, path):
        self.embeddings = np.load(f'{path}/embeddings.npy')  # this loads a ~ 100000x512 float32 array
        quantizer = faiss.IndexFlatIP(512)
        self.index = faiss.IndexIVFFlat(quantizer, 512, 100, faiss.METRIC_L2)
        self.index.train(self.embeddings)
        self.index.add(self.embeddings)

    def find_nearest(self, index, n):
        return self.index.search(self.embeddings[index].reshape(1, -1), n)

p = Products('path/to/the/npy')
p.find_nearest(100, 10)  # segfault happens here

When implementing the same without a class, there is no segmentation fault:

import faiss
import numpy as np

path = 'path/to/the/npy'
embeddings = np.load(f'{path}/embeddings.npy')  # this loads a ~ 100000x512 float32 array
quantizer = faiss.IndexFlatIP(512)
index = faiss.IndexIVFFlat(quantizer, 512, 100, faiss.METRIC_L2)
index.train(embeddings)
index.add(embeddings)

def find_nearest(i, n):
    return index.search(embeddings[i].reshape(1, -1), n)

find_nearest(100, 10)  # no segfault, works as expected

Answer 3 · 2018-07-30T12:23:58.000Z

@wuhu This is because the quantizer gets garbage-collected by python. You could do self.quantizer = ... instead, so that it is not GCed before your Products instance is destroyed.

Answer 4 · 2018-07-30T12:44:43.000Z

@beauby Thanks for the quick reply! That worked.

Answer 5 · 2018-07-30T13:11:20.000Z

This seems to be another case of crashes in Python code. That makes me wonder: what would be the consequences of having the bindings automatically keep references to nested indexes on construction of new indexes (as in, apply the dont_dealloc_me trick)? I would imagine this to be safer and more predictable than requiring users to keep references by themselves in Python-land (although I probably overlooked something).

Answer 6 · 2018-07-30T13:29:56.000Z

It is on our TODO list. It requires to do some SWIG trick that adds that reference when an object is passed in as a reference to a few dozen functions and constructors.
The reason why we have not done it yet is:

I don't know exactly where in SWIG this has to be done
this problem hits only mid-level users. Low-level users know well enough the library and how the references work, and high-level users only use the index_factory that does not have this problem.

Answer 7 · 2018-07-31T10:47:48.000Z

sorry, i reply so late. i double checked my code, and i found the same problem as @wuhu, not caused by the line i mentioned

Answer 8 · 2018-07-31T15:02:17.000Z

@godkillok Great – can we close this issue then?

Answer 9 · 2019-07-24T18:07:22.000Z

It should be documented at least somewhere while the automatic reference counting is not implemented yet.

Answer 10 · 2019-07-24T19:35:02.000Z

@asanakoy It is actually done now. If you are encountering the same behavior, it is a bug, in which case please open a separate issue.