model failures
topepo opened this issue · 2 comments
I have had a few cases where it appears that the objective function is nan
. This has occurred when too much regularization is used but, in the example below, there is no penalty applied. This uses RcppML
0.1.0.
library(RcppML)
library(modeldata)
library(Matrix)
data(biomass)
biomass <- biomass[, 3:7]
biomass <- as.matrix(biomass)
biomass <- t(biomass)
biomass <- Matrix(biomass, sparse = TRUE)
res <- nmf(biomass, k = 5, seed = 1)
#>
#> iter | tol
#> ---------------
#> 1 | nan
#> 2 | nan
#> 3 | nan
#> 4 | nan
#> 5 | nan
#> 6 | nan
#> 7 | nan
#> 8 | nan
#> 9 | nan
#> 10 | nan
#> 11 | nan
#> 12 | nan
#> 13 | nan
#> 14 | nan
#> 15 | nan
#> 16 | nan
#> 17 | nan
#> 18 | nan
#> 19 | nan
#> 20 | nan
#> 21 | nan
#> 22 | nan
#> 23 | nan
#> 24 | nan
#> 25 | nan
#> 26 | nan
#> 27 | nan
#> 28 | nan
#> 29 | nan
#> 30 | nan
#> 31 | nan
#> 32 | nan
#> 33 | nan
#> 34 | nan
#> 35 | nan
#> 36 | nan
#> 37 | nan
#> 38 | nan
#> 39 | nan
#> 40 | nan
#> 41 | nan
#> 42 | nan
#> 43 | nan
#> 44 | nan
#> 45 | nan
#> 46 | nan
#> 47 | nan
#> 48 | nan
#> 49 | nan
#> 50 | nan
#> 51 | nan
#> 52 | nan
#> 53 | nan
#> 54 | nan
#> 55 | nan
#> 56 | nan
#> 57 | nan
#> 58 | nan
#> 59 | nan
#> 60 | nan
#> 61 | nan
#> 62 | nan
#> 63 | nan
#> 64 | nan
#> 65 | nan
#> 66 | nan
#> 67 | nan
#> 68 | nan
#> 69 | nan
#> 70 | nan
#> 71 | nan
#> 72 | nan
#> 73 | nan
#> 74 | nan
#> 75 | nan
#> 76 | nan
#> 77 | nan
#> 78 | nan
#> 79 | nan
#> 80 | nan
#> 81 | nan
#> 82 | nan
#> 83 | nan
#> 84 | nan
#> 85 | nan
#> 86 | nan
#> 87 | nan
#> 88 | nan
#> 89 | nan
#> 90 | nan
#> 91 | nan
#> 92 | nan
#> 93 | nan
#> 94 | nan
#> 95 | nan
#> 96 | nan
#> 97 | nan
#> 98 | nan
#> 99 | nan
#> 100 | nan
all(is.na(res$w))
#> [1] TRUE
Created on 2021-09-09 by the reprex package (v2.0.0)
@topepo Thanks for raising this issue, clearly it's causing you some trouble. Unfortunately, I could not reproduce your example using the dataset as supplied on two different machines (Windows and CentOS). I tried all seeds between 1 and 10000 for k = 5. In no case was all(is.na(res$w)) == TRUE
.
Yes, too much regularization can drive a factor in w
or h
to complete sparsity, and thus numerical instability. That is the expectation.
Here's what my initial w
matrix looks like:
> w_init <- nmf(biomass, k = 5, seed = 1, maxit = 0)$w
iter | tol
---------------
> w_init
[,1] [,2] [,3] [,4] [,5]
[1,] 0.2655087 0.89838968 0.2059746 0.4976992 0.9347052
[2,] 0.3721239 0.94467527 0.1765568 0.7176185 0.2121425
[3,] 0.5728534 0.66079779 0.6870228 0.9919061 0.6516738
[4,] 0.9082078 0.62911404 0.3841037 0.3800352 0.1255551
[5,] 0.2016819 0.06178627 0.7698414 0.7774452 0.2672207
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
I also tried on a different machine:
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
> library(RcppML)
> library(modeldata)
> library(Matrix)
>
> data(biomass)
>
> biomass <- biomass[, 3:7]
> biomass <- as.matrix(biomass)
> biomass <- t(biomass)
> biomass <- Matrix(biomass, sparse = TRUE)
>
> res <- nmf(biomass, k = 5, seed = 1, maxit = 5)
iter | tol
---------------
1 | 4.58e-01
2 | 1.03e-01
3 | 5.50e-03
4 | 8.68e-04
5 | 6.20e-04
convergence not reached in 5 iterations
(actual tol = 6.20e-04, target tol = 1.00e-04)
> all(is.na(res$w))
[1] FALSE
I updated to the GH version and was able to reproduce what you have. Thanks!