Metric used for the estimation of dimensions
heliosdrm opened this issue · 1 comments
There is an inconsistency in the metrics used for calculating the optimal dimensions of reconstruct
/ embed
.
In the code of the internal function _average_a
(used by estimate_dimension
), the distance δ
of nearest neighbors is calculated using an infinity norm, as in Cao's paper - e.g. δ = norm(R2[i]-R2[j], Inf)
. However, the KDTree
that was used to find the nearest neighbors is calculated with the default options (tree2 = KDTree(R2)
), including Euclidean distances (as said in https://github.com/KristofferC/NearestNeighbors.jl).
The KDTree
should be calculated with the Chebyshev
metric to be consistent. Another option (which I advocate) is let the user choose what Metric
is used. PR #9 includes code to facilitate this (only for the calculation of δ
, not for the KDTree
).
The authors of the other methods (FNN, FFNN) used Euclidean distances in their papers. This is the only option consistent with Kennel's equations [4-5] (FNN), but in the case of the FFNN, other metrics could be used without problem.