hdbscan-non-numeric argument to binary operator
Closed this issue · 24 comments
library(largeVis)
set.seed(123)
ts_matrix_elec <- elect_data %>% scale() %>% t()
visObject <- largeVis(ts_matrix_elec, n_trees = 50,
K = 10)
plot(t(visObject$coords))
clusters <- hdbscan(visObject, verbose = FALSE) # failed
Error in stats::aggregate(probs, by = list(clusters), FUN = "max")$probs - :
non-numeric argument to binary operator
gplot(clusters, t(visObject$coords))
What happened? Is there any suggestion?
Hey can you make elect_data
available and I'll look? thanks.
Thank you! and how to make the data avaliable for you?
https://www.dropbox.com/s/0v41q45yvn9ahzh/test_gm_data.csv?dl=0
I upload the data to Dropbox, thank you!
It's time series from January 2012 to December 2015 monthly data, there are 11524 individuals, i want to cluster these individuals based on the time dimension.
I can't reproduce it. Can you try the version currently in branch hotfix/twobugs
and confirm if the issue is now resolved?
Loading required package: Matrix
> library(readr)
> elect_data <- read_csv("~/Downloads/test_gm_data.csv")
Parsed with column specification:
cols(
.default = col_double()
)
See spec(...) for full column specifications.
> str(elect_data)
<snip>
> library(largeVis)
Loading required package: Rcpp
> library(magrittr)
> ts_matrix <- elect_data %>% scale() %>% t()
> visObj <- largeVis(ts_matrix, n_trees = 50, K = 10, verbose = TRUE)
Searching for neighbors.
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating edge weights...
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
**************************************************|
Estimating embeddings.
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
**************************************************|
Warning message:
In largeVis(ts_matrix, n_trees = 50, K = 10, verbose = TRUE) :
The Distances between some neighbors are large enough to cause the calculation of p_{j|i} to overflow. Scaling the distance vector.
> plot(t(visObj$coords))
> clusters <- hdbscan(visObj, verbose = TRUE)
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
**************************************************|
> gplot(clusters, t(visObj$coords))
Warning message:
Removed 1337 rows containing missing values (geom_segment).
>
devtools::install_github("elbamos/largeVis", ref = "hotfix/twobugs")
Yes, it works!
But why there is "NA" in the plot? I can't upload this image, did you see that in you plot?
Yes. Points will have cluster NA
if the algorithm does not put them in a cluster. You can review the documentation on the algorithm for detail if you'd like.
I'm going to close this now - feel free to reopen if anything comes up.
Thank you!
@bifeng There was a bug in the version of largeVis that you tested a week ago. The bug caused the hdbscan
algorithm to fail to combine clusters that should be combined. If you try the version that I've just pushed, it should produce better results on your dataset.
I am also encountering the same problem.
> load('C:/lab/normdata.Rdata')
> library(largeVis)
> library(ggplot2)
> norm <- scale(norm)
> l <- largeVis(norm,verbose=T)
Searching for neighbors.
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating edge weights...
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
**************************************************|
Estimating embeddings.
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
**************************************************|
> clusters <- largeVis::hdbscan(l,verbose=T)
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
**************************************************|
********Error in stats::aggregate(probs, by = list(clusters), FUN = "max")$probs - :
non-numeric argument to binary operator
In addition: Warning message:
In largeVis(norm, verbose = T) :
The Distances between some neighbors are large enough to cause the calculation of p_{j|i} to overflow. Scaling the distance vector.
I'd rather not publicly post the data. Can I email it to you?
I installed it with
devtools::install_github("elbamos/largeVis", ref = "hotfix/twobugs")
Is this correct?
I've reinstalled from master and it's still throwing the same error.
> h <- hdbscan(vis, verbose=T)
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
**************************************************|
********Error in stats::aggregate(probs, by = list(clusters), FUN = "max")$probs - :
non-numeric argument to binary operator
In addition: Warning message:
In largeVis(norm, verbose = T) :
The Distances between some neighbors are large enough to cause the calculation of p_{j|i} to overflow. Scaling the distance vector.
That's very odd. Can you send me the log or a screenshot of a complete session? Start from an empty environment, load largeVis, check the version, and try the commands in just the way I did them?
> load('C:/lab/normdata.Rdata')
> library(largeVis)
> library(ggplot2)
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_2.2.1 largeVis_0.2.2 Matrix_1.2-9
loaded via a namespace (and not attached):
[1] colorspace_1.3-2 scales_0.4.1 compiler_3.4.0 lazyeval_0.2.0 plyr_1.8.4
[6] tools_3.4.0 gtable_0.2.0 tibble_1.3.3 Rcpp_0.12.11 grid_3.4.0
[11] rlang_0.1.1 munsell_0.4.3 lattice_0.20-35
> norm <- scale(norm)
> vis <- largeVis(norm,verbose=T)
Searching for neighbors.
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating edge weights...
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
**************************************************|
Estimating embeddings.
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
**************************************************|
> plot(t(vis$coords))
> h <- hdbscan(vis, verbose=T)
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
**************************************************|
********Error in stats::aggregate(probs, by = list(clusters), FUN = "max")$probs - :
non-numeric argument to binary operator
In addition: Warning message:
In largeVis(norm, verbose = T) :
The Distances between some neighbors are large enough to cause the calculation of p_{j|i} to overflow. Scaling the distance vector.