ytgui/temp

nearest neighbor search

Opened this issue · 3 comments

ytgui commented

https://en.wikipedia.org/wiki/Nearest_neighbor_search

  • k-nearest neighbors
  • approximate nearest neighbor
  • nearest neighbor distance ratio
  • fixed-radius near neighbors
  • all nearest neighbors
ytgui commented

Faiss

1-Flat

// build index
faiss::IndexFlatL2 index(n_dim);
index.add(n_batch, x_train);

// sanity check, make sure x_train's neighbor is it self
// n_y eq top_k
index.search(n_batch, x_train, n_y, y_distances, y_labels);

// real search
index.search(n_batch, x_batch, n_y, y_distances, y_labels);

IndexFlat.h

  • METRIC_INNER_PRODUCT
  • METRIC_L2
  • METRIC_L1
  • METRIC_Linf
  • METRIC_Lp

IndexFlat.cpp

ugly!!

  • IndexFlatL2::search -> knn_L2sqr -> knn_L2sqr_blas
knn_L2sqr_blas(
knn_L2sqr(x_batch,
                 x_train,
                 n_dim,
                 n_batch,
                 n_train,
                 {n_batch, n_y, y_labels, y_distances}
), [](float dis) { return dis; // do nothing to dis });

ytgui commented

? 2-IVFFlat

// There are two parameters to the search method:
// nlist, the number of cells
// nprobe, the number of cells (out of nlist) that are visited to perform a search.
// The search time roughly increases linearly with the number of probes plus some constant due to the quantization.

// index
faiss::IndexFlatL2 quantizer(n_dim);
faiss::IndexIVFFlat index(&quantizer, n_dim, n_list, faiss::METRIC_L2);

// build
assert(!index.is_trained);
index.train(n_train, x_train);
assert(index.is_trained);
index.add(n_train, x_train);

// search
index.search(n_batch, x_batch, n_y, y_distances, y_labels);

// default nprobe is 1
index.nprobe = 10;
index.search(n_batch, x_batch, n_y, y_distances, y_labels);