
Nearest neighbor search for Rails and Postgres

Nearest neighbor search for Rails and Postgres

Add this line to your application’s Gemfile:

gem "neighbor"

Choose An Extension

Neighbor supports two extensions: cube and vector. cube ships with Postgres, while vector supports more dimensions and approximate nearest neighbor search.

For cube, run:

rails generate neighbor:cube
rails db:migrate

For vector, install pgvector and run:

rails generate neighbor:vector
rails db:migrate

Getting Started

Create a migration

class AddEmbeddingToItems < ActiveRecord::Migration[7.1]
  def change
    add_column :items, :embedding, :cube
    # or
    add_column :items, :embedding, :vector, limit: 3 # dimensions

Add to your model

class Item < ApplicationRecord
  has_neighbors :embedding

Update the vectors

item.update(embedding: [1.0, 1.2, 0.5])

Get the nearest neighbors to a record

item.nearest_neighbors(:embedding, distance: "euclidean").first(5)

Get the nearest neighbors to a vector

Item.nearest_neighbors(:embedding, [0.9, 1.3, 1.1], distance: "euclidean").first(5)


Supported values are:

  • euclidean
  • cosine
  • taxicab (cube only)
  • chebyshev (cube only)
  • inner_product (vector only)

For cosine distance with cube, vectors must be normalized before being stored.

class Item < ApplicationRecord
  has_neighbors :embedding, normalize: true

For inner product with cube, see this example.

Records returned from nearest_neighbors will have a neighbor_distance attribute

nearest_item = item.nearest_neighbors(:embedding, distance: "euclidean").first


The cube data type can have up to 100 dimensions by default. See the Postgres docs for how to increase this. The vector data type can have up to 16,000 dimensions, and vectors with up to 2,000 dimensions can be indexed.

For cube, it’s a good idea to specify the number of dimensions to ensure all records have the same number.

class Item < ApplicationRecord
  has_neighbors :embedding, dimensions: 3


For vector, add an approximate index to speed up queries. Create a migration with:

class AddIndexToItemsEmbedding < ActiveRecord::Migration[7.1]
  def change
    add_index :items, :embedding, using: :hnsw, opclass: :vector_l2_ops
    # or
    add_index :items, :embedding, using: :ivfflat, opclass: :vector_l2_ops

Use :vector_cosine_ops for cosine distance and :vector_ip_ops for inner product.

Set the size of the dynamic candidate list with HNSW

Item.connection.execute("SET hnsw.ef_search = 100")

Or the number of probes with IVFFlat

Item.connection.execute("SET ivfflat.probes = 3")


OpenAI Embeddings

Generate a model

rails generate model Document content:text embedding:vector{1536}
rails db:migrate

And add has_neighbors

class Document < ApplicationRecord
  has_neighbors :embedding

Create a method to call the embeddings API

def fetch_embeddings(input)
  url = "https://api.openai.com/v1/embeddings"
  headers = {
    "Authorization" => "Bearer #{ENV.fetch("OPENAI_API_KEY")}",
    "Content-Type" => "application/json"
  data = {
    input: input,
    model: "text-embedding-ada-002"

  response = Net::HTTP.post(URI(url), data.to_json, headers)
  JSON.parse(response.body)["data"].map { |v| v["embedding"] }

Pass your input

input = [
  "The dog is barking",
  "The cat is purring",
  "The bear is growling"
embeddings = fetch_embeddings(input)

Store the embeddings

documents = []
input.zip(embeddings) do |content, embedding|
  documents << {content: content, embedding: embedding}

And get similar articles

document = Document.first
document.nearest_neighbors(:embedding, distance: "cosine").first(5).map(&:content)

See the complete code

Disco Recommendations

You can use Neighbor for online item-based recommendations with Disco. We’ll use MovieLens data for this example.

Generate a model

rails generate model Movie name:string factors:cube
rails db:migrate

And add has_neighbors

class Movie < ApplicationRecord
  has_neighbors :factors, dimensions: 20, normalize: true

Fit the recommender

data = Disco.load_movielens
recommender = Disco::Recommender.new(factors: 20)

Store the item factors

movies = []
recommender.item_ids.each do |item_id|
  movies << {name: item_id, factors: recommender.item_factors(item_id)}
Movie.insert_all!(movies) # use create! for Active Record < 6

And get similar movies

movie = Movie.find_by(name: "Star Wars (1977)")
movie.nearest_neighbors(:factors, distance: "cosine").first(5).map(&:name)

See the complete code for cube and vector



The distance option has been moved from has_neighbors to nearest_neighbors, and there is no longer a default. If you use cosine distance, set:

class Item < ApplicationRecord
  has_neighbors normalize: true


