Fix the distance definition
petitgrizzlies opened this issue · 7 comments
diff = x_1 - x_2
return max(diff, 2)
Add a FIXME
for @mencattini in 30e0969 to improve (if needed) the definition of the distance in knn
.
Not sure to understand the fixme. This code "clip" the value only if it become greater than bound
.
Let take an exemple :
x1 = [1,0,-1,-1], x2 = [1,1,-1,1], then distance = sum ([0,1,0,2]).
x1 =[1, 20, 1], x2 = [1, 10, 1], then distance = sum([0, 10, 0]) which is "cliped" to [0, 2, 0]
x1 = [1, 10, -1, 0], x2 = [1, 5, 1, 1] then distance = sum([0, 5, 2, 1]) which is "cliped" to [0, 2, 2, 1]
By the construction of the array, the value of binary componants can't exceed 2.
Did i miss-understand the initial querry ?
If you have the following vectors: x1 = [1,0,1,2]
and x2 = [1,1,-1,3]
, then the result should be [0,1,2,2]
, because there is a distance of 2
between -1
and 1
, but also because there is a distance of 2
for the last part, because values are greater than 1
.
Ok. I didn't understand the mertic on this way.
But is it a problem when there is only one feature that isn't in ternary forme ?
For instance?
Distance is fixed by 6781f39. Can you check ? (I explain the algorithm in the comments)