The Upstash Vector Python client
Note
This project is in GA Stage.
The Upstash Professional Support fully covers this project. It receives regular updates, and bug fixes. The Upstash team is committed to maintaining and improving its functionality.
Install a released version from pip:
pip3 install upstash-vector
In order to use this client, head out to Upstash Console and create a vector database. There, get the URL and the TOKEN from the dashboard.
from upstash_vector import Index
index = Index(url=UPSTASH_VECTOR_REST_URL, token=UPSTASH_VECTOR_REST_TOKEN)
or alternatively, initialize from the environment
export UPSTASH_VECTOR_REST_URL [URL]
export UPSTASH_VECTOR_REST_TOKEN [TOKEN]
from upstash_vector import Index
index = Index.from_env()
There are couple ways to upsert vectors. Feel free to use whichever one feels the most comfortable.
index.upsert(
vectors=[
("id1", [0.1, 0.2], {"metadata_field": "metadata_value"}),
("id2", [0.3, 0.4]),
]
)
index.upsert(
vectors=[
{"id": "id3", "vector": [0.1, 0.2], "metadata": {"metadata_f": "metadata_v"}},
{"id": "id4", "vector": [0.5, 0.6]},
]
)
from upstash_vector import Vector
index.upsert(
vectors=[
Vector(id="id5", vector=[1, 2], metadata={"metadata_f": "metadata_v"}),
Vector(id="id6", vector=[6, 7]),
]
)
If you are using an Upstash Vector with an embedding model, you can directly insert data as a string:
from upstash_vector import Data
res = index.upsert(
vectors=[
Data(id="id5", data="Goodbye-World", metadata={"metadata_f": "metadata_v"}),
Data(id="id6", data="Hello-World"),
]
)
query_vector = [0.6, 0.9]
top_k = 6
query_res = index.query(
vector=query_vector,
top_k=top_k,
include_vectors=True,
include_metadata=True,
filter="metadata_f = 'metadata_v'"
)
# query_res is a list of vectors with scores:
# query_res[n].id: The identifier associated with the matching vector.
# query_res[n].score: A measure of similarity indicating how closely the vector matches the query vector.
# query_res[n].vector: The vector itself (included only if `include_vector` is set to `True`).
# query_res[n].metadata: Additional information or attributes linked to the matching vector.
If you are using an Upstash Vector with an embedding model, you can query the index with some text:
query_res = index.query(
data="hello"
top_k=3,
include_vectors=True,
include_metadata=True,
)
res = index.fetch(["id3", "id4"], include_vectors=True, include_metadata=True)
# res.vectors: A list containing information for each fetched vector, including `id`, `vector`, and `metadata`.
or, for singular fetch:
res = index.fetch("id1", include_vectors=True, include_metadata=True)
# Scans the index 3 by 3, until all the indexes are traversed.
res = index.range(cursor="", limit=3, include_vectors=True, include_metadata=True)
while res.next_cursor != "":
res = index.range(cursor=res.next_cursor, limit=3, include_vectors=True, include_metadata=True)
# res.nex_cursor: A cursor indicating the position to start the next range query. If "", there are no more results.
# res.vectors: A list containing information for each vector, including `id`, `vector`, and `metadata`.
res = index.delete(["id1", "id2"])
# res.deleted: An integer indicating how many vectors were deleted with the command.
or, for singular deletion:
res = index.delete("id1")
# This will remove all the vectors that were upserted and index will be reset.
index.reset()
info = index.info()
# info.vector_count: total number of vectors in the index
# info.pending_vector_count: total number of vectors waiting to be indexed
# info.index_size: total size of the index on disk in bytes
# info.dimension: how many dimensions the index has
# info.similarity_function: similarity function chosen for the index
This project uses Poetry for packaging and dependency management. Make sure you are able to create the poetry shell with relevant dependencies.
You will also need a vector database on Upstash.
poetry install
poetry run ruff format .
To run all the tests, make sure the poetry virtual environment activated with all the necessary dependencies.
Create two Vector Stores on upstash. First one should have 2 dimensions. Second one should use an embedding model. Set the necessary environment variables:
URL=****
TOKEN=****
EMBEDDING_URL=****
EMBEDDING_TOKEN=****
Then, run the following command to run tests:
poetry run pytest