[Bug] Wrong wording on the Cosine Similarity doc page
Closed this issue · 5 comments
Context / Scenario
Read the document Cosine Similarity.
What happened?
The document Cosine Similarity contains the following text:
Cosine similarity is particularly useful when working with high-dimensional data such as word embeddings because it takes into account both the magnitude and direction of each vector. This makes it more robust than other measures like Euclidean distance, which only considers the magnitude.
The distances of both sentences from truth are huge enough to rewrite them.
Importance
a fix would make my life easier
Platform, Language, Versions
KM Version 0.62.
Relevant log output
No response
Any suggestion about how to improve the text?
I'm not a good English writer. Sorry for wasting time on basic math.
- The cosine similarity does not take the magnitude into account. Dot product divided by the product of magnitudes is the cosine of the angle between the two vectors.
- Euclidean distance always considers the angle between vectors. Euclidean distance between vectors (2) and (1) is 1 while Euclidean distance between vectors (2) and (-1) is 3. Euclidean distance between two arbitrary radius vectors of a circle is not necessarily zero.
- Cosine similarity is often preferred because it compensates for magnitudes.
The sentence about cosine similarity seems ok to me, there's only a typo in the Euclidean part at the end. This change should be sufficient:
Cosine similarity is particularly useful when working with high-dimensional data such as word embeddings because it takes into account both the magnitude and direction of each vector. This makes it more robust than other measures like Euclidean distance, which only considers the
magnitudedirection.
Given two n-dimensional points A and B
- A is a point (A1, A2, ..., An)
- B is a point (B1, B2, ..., Bn)
Cosine similarity:
Where:
-
|A| is the magnitude of vector A
-
|B| is the magnitude of vector B
so cosine similarity does take into account the magnitude, as mentioned.
Euclidean distance:
so Euclidean distance considers only the direction, not the magnitude -- this is the part to fix.
Docs updated
I'm sorry for wasting your time on basic math but it is important.
In a Euclidean space
In fact,
Given five vectors
Magnitudes are
Euclidean distances ρ are
Cosine similarities σ are
A,B, and C have same direction and different magnitudes. Euclidean distances are different. Cosine similarities equal. Euclidean distance takes into account both direction and magnitude.