cui-unige/mcc4mcc

Fix the distance definition

petitgrizzlies opened this issue · 7 comments

diff = x_1 - x_2
return max(diff, 2)

Fixed by 553b2cc

Add a FIXME for @mencattini in 30e0969 to improve (if needed) the definition of the distance in knn.

Not sure to understand the fixme. This code "clip" the value only if it become greater than bound.
Let take an exemple :
x1 = [1,0,-1,-1], x2 = [1,1,-1,1], then distance = sum ([0,1,0,2]).
x1 =[1, 20, 1], x2 = [1, 10, 1], then distance = sum([0, 10, 0]) which is "cliped" to [0, 2, 0]
x1 = [1, 10, -1, 0], x2 = [1, 5, 1, 1] then distance = sum([0, 5, 2, 1]) which is "cliped" to [0, 2, 2, 1]
By the construction of the array, the value of binary componants can't exceed 2.

Did i miss-understand the initial querry ?

If you have the following vectors: x1 = [1,0,1,2] and x2 = [1,1,-1,3], then the result should be [0,1,2,2], because there is a distance of 2 between -1 and 1, but also because there is a distance of 2 for the last part, because values are greater than 1.

Ok. I didn't understand the mertic on this way.
But is it a problem when there is only one feature that isn't in ternary forme ?

For instance?

Distance is fixed by 6781f39. Can you check ? (I explain the algorithm in the comments)