gann (go-approximate-nearest-neighbor) is a library for approximate nearest neighbor search purely written in golang.
The implemented algorithm is truly inspired by Annoy (https://github.com/spotify/annoy).
- purely written in Go: no dependencies out of Go world.
- easy to tune with a bit of parameters
go get github.com/mathetake/gann
name | type | description | run-time complexity | space complexity | accuracy |
---|---|---|---|---|---|
dim | int | dimension of target vectors | the larger, the more expensive | the larger, the more expensive | N/A |
nTree | int | # of trees | the larger, the more expensive | the larger, the more expensive | the larger, the more accurate |
k | int | maximum # of items in a single leaf | the larger, the less expensive | N/A | the larger, the less accurate |
name | type | description | time complexity | accuracy |
---|---|---|---|---|
searchNum | int | # of requested neighbors | the larger, the more expensive | N/A |
bucketScale | float64 | affects the size of bucket |
the larger, the more expensive | the larger, the more accurate |
bucketScale
affects the size of bucket
which consists of items for exact distance calculation.
The actual size of the bucket is calculated by int(searchNum * bucketScale)
.
In the search phase, we traverse index trees and continuously put items on reached leaves to the bucket until the bucket becomes full. Then we calculate the exact distances between a item in the bucket and the query vector to get approximate nearest neighbors.
Therefore, the larger bucketScale
, the more computational complexity while the more accurate result to be produced.
package main
import (
"fmt"
"math/rand"
"time"
"github.com/mathetake/gann"
"github.com/mathetake/gann/metric"
)
var (
dim = 3
nTrees = 2
k = 10
nItem = 1000
)
func main() {
rawItems := make([][]float64, 0, nItem)
rand.Seed(time.Now().UnixNano())
for i := 0; i < nItem; i++ {
item := make([]float64, 0, dim)
for j := 0; j < dim; j++ {
item = append(item, rand.Float64())
}
rawItems = append(rawItems, item)
}
m, err := metric.NewCosineMetric(dim)
if err != nil {
// err handling
return
}
// create index
idx, err := gann.CreateNewIndex(rawItems, dim, nTrees, k, m)
if err != nil {
// error handling
return
}
// search
var searchNum = 5
var bucketScale float64 = 10
q := []float64{0.1, 0.02, 0.001}
res, err := idx.GetANNbyVector(q, searchNum, bucketScale)
if err != nil {
// error handling
return
}
fmt.Printf("res: %v\n", res)
}
- https://github.com/spotify/annoy
- https://en.wikipedia.org/wiki/Nearest_neighbor_search#Approximate_nearest_neighbor
MIT