pajaskowiak/dbcv

dataset_1.txt noise examples appear to be labeled as '-1'

FelSiq opened this issue · 0 comments

Hello,

I know the example provided in the package README is a synthetic example intended to showcase a basic program execution. However, I believe the score presented may be a bit misleading because in the dataset used, "dataset_1.txt", the noise instances appear to be labeled as "-1", not "0" as assumed by the package implementation itself. As far as I understand, this means they are considered a real cluster during the DBCV computation, thus substantially modifying the estimated metric score (reported estimation=0.6149, estimation w/ labels fixed=0.8576).

The same issue presumably applies to all other example datasets.

Also, am I correct by assuming that the distance metric used during the DBCV computation is the squared euclidean distance? I understand this is a legitimate choice; I just want to clarify if my understanding is correct.