/vector-py

Upstash Vector Python SDK

Primary LanguagePythonMIT LicenseMIT

Upstash Vector Python SDK

The Upstash Vector Python client

Note

This project is in GA Stage.

The Upstash Professional Support fully covers this project. It receives regular updates, and bug fixes. The Upstash team is committed to maintaining and improving its functionality.

Installation

Install a released version from pip:

pip3 install upstash-vector

Usage

In order to use this client, head out to Upstash Console and create a vector database. There, get the URL and the TOKEN from the dashboard.

Initialize the client

from upstash_vector import Index

index = Index(url=UPSTASH_VECTOR_REST_URL, token=UPSTASH_VECTOR_REST_TOKEN)

or alternatively, initialize from the environment

export UPSTASH_VECTOR_REST_URL [URL]
export UPSTASH_VECTOR_REST_TOKEN [TOKEN]
from upstash_vector import Index

index = Index.from_env()

Upsert Vectors

There are couple ways to upsert vectors. Feel free to use whichever one feels the most comfortable.

index.upsert(
    vectors=[
        ("id1", [0.1, 0.2], {"metadata_field": "metadata_value"}),
        ("id2", [0.3, 0.4]),
    ]
)
index.upsert(
    vectors=[
        {"id": "id3", "vector": [0.1, 0.2], "metadata": {"metadata_f": "metadata_v"}},
        {"id": "id4", "vector": [0.5, 0.6]},
    ]
)
from upstash_vector import Vector

index.upsert(
    vectors=[
        Vector(id="id5", vector=[1, 2], metadata={"metadata_f": "metadata_v"}),
        Vector(id="id6", vector=[6, 7]),
    ]
)

If you are using an Upstash Vector with an embedding model, you can directly insert data as a string:

from upstash_vector import Data

res = index.upsert(
    vectors=[
        Data(id="id5", data="Goodbye-World", metadata={"metadata_f": "metadata_v"}),
        Data(id="id6", data="Hello-World"),
    ]
)

Query Index

query_vector = [0.6, 0.9]
top_k = 6
query_res = index.query(
    vector=query_vector,
    top_k=top_k,
    include_vectors=True,
    include_metadata=True,
    filter="metadata_f = 'metadata_v'"
)
# query_res is a list of vectors with scores:
# query_res[n].id: The identifier associated with the matching vector.
# query_res[n].score: A measure of similarity indicating how closely the vector matches the query vector.
# query_res[n].vector: The vector itself (included only if `include_vector` is set to `True`).
# query_res[n].metadata: Additional information or attributes linked to the matching vector.

If you are using an Upstash Vector with an embedding model, you can query the index with some text:

query_res = index.query(
    data="hello"
    top_k=3,
    include_vectors=True,
    include_metadata=True,
)

Fetch Indexes

res = index.fetch(["id3", "id4"], include_vectors=True, include_metadata=True)
# res.vectors: A list containing information for each fetched vector, including `id`, `vector`, and `metadata`.

or, for singular fetch:

res = index.fetch("id1", include_vectors=True, include_metadata=True)

Range over Vectors - Scan the Index

# Scans the index 3 by 3, until all the indexes are traversed.
res = index.range(cursor="", limit=3, include_vectors=True, include_metadata=True)
while res.next_cursor != "":
    res = index.range(cursor=res.next_cursor, limit=3, include_vectors=True, include_metadata=True)

# res.nex_cursor: A cursor indicating the position to start the next range query. If "", there are no more results.
# res.vectors: A list containing information for each vector, including `id`, `vector`, and `metadata`.

Delete Vectors

res = index.delete(["id1", "id2"])
# res.deleted: An integer indicating how many vectors were deleted with the command.

or, for singular deletion:

res = index.delete("id1")

Reset the Index

# This will remove all the vectors that were upserted and index will be reset.
index.reset() 

Index Info

info = index.info()
# info.vector_count: total number of vectors in the index
# info.pending_vector_count: total number of vectors waiting to be indexed
# info.index_size: total size of the index on disk in bytes 
# info.dimension: how many dimensions the index has 
# info.similarity_function: similarity function chosen for the index

Contributing

Preparing the environment

This project uses Poetry for packaging and dependency management. Make sure you are able to create the poetry shell with relevant dependencies.

You will also need a vector database on Upstash.

poetry install 

Code Formatting

poetry run ruff format .

Running tests

To run all the tests, make sure the poetry virtual environment activated with all the necessary dependencies.

Create two Vector Stores on upstash. First one should have 2 dimensions. Second one should use an embedding model. Set the necessary environment variables:

URL=****
TOKEN=****
EMBEDDING_URL=****
EMBEDDING_TOKEN=****

Then, run the following command to run tests:

poetry run pytest