paobranco/UBL

Error: Can not compute Euclidean distance with nominal attributes

Opened this issue · 0 comments

Thanks for all your work on this useful package! I was surprised to see that Euclidean distance could not be used on a formula that contained only numeric variables. The function seems to care if the dataset contains factors, even when they're not used in the formula. That may be as designed, so I'm just reporting this in case you view it as an error.

library("UBL")
Loading required package: MBA
Loading required package: gstat
Registered S3 method overwritten by 'xts':
method from
as.zoo.xts zoo
Loading required package: automap
Loading required package: sp
Loading required package: randomForest
randomForest 4.6-14
Type rfNews() to see new features/changes/bug fixes.
library(MASS)
data(cats)
head(cats)
Sex Bwt Hwt
1 F 2.0 7.0
2 F 2.0 7.4
3 F 2.0 9.5
4 F 2.1 7.2
5 F 2.1 7.3
6 F 2.1 7.6
length(cats$Sex)
[1] 144

I'm adding a factor for color:

cats$color <- gl(n = 2, k=1, length = 144, label = c("black","white") )
head(cats)
Sex Bwt Hwt color
1 F 2.0 7.0 black
2 F 2.0 7.4 white
3 F 2.0 9.5 black
4 F 2.1 7.2 white
5 F 2.1 7.3 black
6 F 2.1 7.6 white

I'm not using color, but it yields an error message anyway:

mysmote.cats <- SmoteClassif(Sex ~ Bwt + Hwt, cats, list(M = 0.8, F = 1.8))
Error in neighbours(tgt, dat, dist, p, k) :
Can not compute Euclidean distance with nominal attributes!

HEOM fixes it:

mysmote.cats <- SmoteClassif(Sex ~ Bwt + Hwt, cats, list(M = 0.8, F = 1.8), dist = "HEOM")

sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] UBL_0.0.6 randomForest_4.6-14 automap_1.0-14 sp_1.3-2 gstat_2.0-4
[6] MBA_0.0-9 MASS_7.3-51.4 devtools_2.2.1 usethis_1.5.1

loaded via a namespace (and not attached):
[1] Rcpp_1.0.3 plyr_1.8.5 compiler_3.6.2 prettyunits_1.1.1 remotes_2.1.0 tools_3.6.2
[7] xts_0.11-2 testthat_2.3.1 digest_0.6.23 pkgbuild_1.0.6 pkgload_1.0.2 memoise_1.1.0
[13] lattice_0.20-38 rlang_0.4.4 cli_2.0.1.9000 rstudioapi_0.10 curl_4.3 withr_2.1.2
[19] desc_1.2.0 fs_1.3.1 rprojroot_1.3-2 grid_3.6.2 reshape_0.8.8 spacetime_1.2-2
[25] glue_1.3.1 R6_2.4.1 processx_3.4.1 fansi_0.4.1 sessioninfo_1.1.1 callr_3.4.0
[31] magrittr_1.5 intervals_0.15.1 backports_1.1.5 ps_1.3.0 ellipsis_0.3.0 assertthat_0.2.1
[37] FNN_1.1.3 crayon_1.3.4 zoo_1.8-7