🌬️ A vector database implementation with single-dependency (numpy
).
🎁 It can handle a query from 100,000
vectors and return in 100 milliseconds.
🏃 It's okay for your prototypes, maybe even more.
Install from PyPi
pip install nano-vectordb
Install from source
# clone this repo first
cd nano-vectordb
pip install -e .
Faking your data:
from nano_vectordb import NanoVectorDB
import numpy as np
data_len = 100_000
fake_dim = 1024
fake_embeds = np.random.rand(data_len, fake_dim)
fakes_data = [{"__vector__": fake_embeds[i], **ANYFIELDS} for i in range(data_len)]
You can add any fields to a data. But there are two keywords:
__id__
: If passed,NanoVectorDB
will use your id, otherwise a generated id will be used.__vector__
: must pass, your embeddingnp.ndarray
.
Init a DB:
vdb = NanoVectorDB(fake_dim, storage_file="fool.json")
Next time you init vdb
from fool.json
, NanoVectorDB
will load the index automatically.
Upsert:
r = vdb.upsert(fakes_data)
print(r["update"], r["insert"])
Query:
print(vdb.query(np.random.rand(fake_dim)))
Save:
# will create/overwrite 'fool.json'
vdb.save()
Get, Delete:
# get and delete the inserted data
print(vdb.get(r["insert"]))
vdb.delete(r["insert"])
Embedding Dim: 1024. Device: MacBook M3 Pro
- Save a index with
100,000
vectors will generate a roughly 520M json file. - Insert
100,000
vectors will cost roughly2
s - Query from
100,000
vectors will cost roughly0.1
s