oasysai/oasysdb

FEAT: add optimized cosine distance function for normalized vectors

dteare opened this issue · 1 comments

Use case

When calculating Cosine similarity on normalized vectors there is no need to calculate the magnitudes of each vector as they are already of unit length.

OpenAI embeddings are normalized, and presumably many others are as well. From the OpenAI embeddings FAQs:

OpenAI embeddings are normalized to length 1, which means that:

  • Cosine similarity can be computed slightly faster using just a dot product

The Cosine distance calculation from distance.rs#L46 could benefit from this optimization and avoid the last 3 calculations:

    fn cosine(a: &Vector, b: &Vector) -> f32 {
        let dot = Self::dot(a, b);
        let ma = a.0.iter().map(|x| x.powi(2)).sum::<f32>().sqrt();
        let mb = b.0.iter().map(|y| y.powi(2)).sum::<f32>().sqrt();
        dot / (ma * mb)
    }

Proposed solution

I suggest the Distance enum be expanded to include a CosineOptimizedUnitLength variant. Doing so would fit in nicely with the current config:

    let mut config = Config::default();

    // Using optimized calculation as our embeddings are normalized
    config.distance = Distance::CosineOptimizedUnitLength;

The Distance::calculate function could then match on this and call an optimized method:

fn cosine_normalized(a: &Vector, b: &Vector) -> f32 {
    Self::dot(a, b)
}

Additional Context

Not applicable / did so already inline where appropriate.

Hi, thank you for bringing this information to my attention. I read the link you gave me and I think we can add that to OasysDB.

Also, thank you for providing the solution. I will work with it 😁