DeMoriarty/fast_pytorch_kmeans

Calculating euc_sim()

juskuz opened this issue · 2 comments

juskuz commented

I tested the method euc_sim() and it seems that values are not squared.
Example:

a = torch.tensor([[2, 3], [3, 5], [5, 8]], dtype=torch.float32)
b = torch.tensor([[1, 0], [2, 1]], dtype=torch.float32)
>>> euc_sim(a,b)
tensor([[-10.,  -4.],
        [-29., -17.],
        [-80., -58.]])

Could anyone explain why? I suppose that sqrt doesn't change the order when looking for max value (max similarity). I compared it with torch.cdist() and there are squered values. Example (same variables a and b):

>>> torch.cdist(a,b)
tensor([[3.1623, 2.0000],
        [5.3852, 4.1231],
        [8.9443, 7.6158]])

euc_sim() computes negative squared euclidean distance. it's negative because we need the largest value to indicate most similar (closest) pair of vectors, and we're not taking sqrt of it because what matters is the order of the values, not the values themselves, so we might as well save some computation by not doing sqrt

juskuz commented

Thanks for clarification. I wanted to use the euc_sim formula for another computations (or write unit tests for myself) and the value of distance was a bit unclear.