FEAT: add optimized cosine distance function for normalized vectors
dteare opened this issue · 1 comments
Use case
When calculating Cosine similarity on normalized vectors there is no need to calculate the magnitudes of each vector as they are already of unit length.
OpenAI embeddings are normalized, and presumably many others are as well. From the OpenAI embeddings FAQs:
OpenAI embeddings are normalized to length 1, which means that:
- Cosine similarity can be computed slightly faster using just a dot product
The Cosine distance calculation from distance.rs#L46 could benefit from this optimization and avoid the last 3 calculations:
fn cosine(a: &Vector, b: &Vector) -> f32 {
let dot = Self::dot(a, b);
let ma = a.0.iter().map(|x| x.powi(2)).sum::<f32>().sqrt();
let mb = b.0.iter().map(|y| y.powi(2)).sum::<f32>().sqrt();
dot / (ma * mb)
}
Proposed solution
I suggest the Distance
enum be expanded to include a CosineOptimizedUnitLength
variant. Doing so would fit in nicely with the current config:
let mut config = Config::default();
// Using optimized calculation as our embeddings are normalized
config.distance = Distance::CosineOptimizedUnitLength;
The Distance::calculate
function could then match on this and call an optimized method:
fn cosine_normalized(a: &Vector, b: &Vector) -> f32 {
Self::dot(a, b)
}
Additional Context
Not applicable / did so already inline where appropriate.
Hi, thank you for bringing this information to my attention. I read the link you gave me and I think we can add that to OasysDB.
Also, thank you for providing the solution. I will work with it 😁