/oes

Omni Embedding Service

Primary LanguageRustApache License 2.0Apache-2.0

Omni Embedding Service (OES)

OES is a self-hostable embeddings service. It allows you to embed data of various types (text, image, audio, etc.) for applications such as RAG, search, model training, etc.

Quick Start

Create a config.yaml file like so:

---
models:
- model_name: openai/clip-vit-base-patch32
  encodings:
  - data_type: text
    replicas: 1
  - data_type: image
    replicas: 1
cargo run -- run

Now you can embed data using the API:

import base64
import openai
import requests
from PIL import Image
from io import BytesIO
import numpy as np

client = openai.Client(api_key="sk", base_url="http://localhost:8080/oai/")

text_embedding1 = client.embeddings.create(
    model="openai/clip-vit-base-patch32/text",
    input="a cat"
)
text_embedding2 = client.embeddings.create(
    model="openai/clip-vit-base-patch32/text",
    input="a yummy potato"
)

def image_to_dataurl(image):
    buffered = BytesIO()
    image.save(buffered, format="PNG")
    img_str = base64.b64encode(buffered.getvalue()).decode()
    return f"data:image/png;base64,{img_str}"

image_url = "https://www.cats.org.uk/uploads/images/featurebox_sidebar_kids/Cat-Behaviour.jpg"
image = Image.open(requests.get(image_url, stream=True).raw)
image_embedding = client.embeddings.create(
    model="openai/clip-vit-base-patch32/image",
    input=image_to_dataurl(image)
)

emb1 = np.array(text_embedding1.data[0].embedding)
emb2 = np.array(text_embedding2.data[0].embedding)
emb3 = np.array(image_embedding.data[0].embedding)
print(f"Similarity between 'a cat' and image: {np.dot(emb1, emb3)}")
print(f"Similarity between 'a potato' and image: {np.dot(emb2, emb3)}")