bwlewis/irlba

Infinite Values for norm2

ilyakorsunsky opened this issue · 2 comments

Hi, for my larger datasets (250,000 x 2000) run with the R code (fastpath=FALSE), I run into the problem that some of the data structures (e.g. V) get so large that the L2 norm (norm2) gets infinite. Then I get errors comparing R and S and eps2, because R or S are infinite. I fixed this problem by scaling by the max of the vector before doing the L2 scaling (example below). Then the code runs to completion. However, I get really large results (e.g. max d is 1e150), which don't match the C implementation. I suspect these large values are themselves the result of a bug.

  V[, 1] <- max_scale(V[, 1])
  V[, 1] <- V[, 1] / norm2(V[, 1])

This may be a related issue: when I ran the C version (fastpath=TRUE) yesterday on the same data, I got the error message "BLAS/LAPACK routine 'DLASCL' gave error code -4". It seems that this error arises when there are NA or INF values in the original matrix. I wonder if this error can also arise from INF values of L2 norm computation. Strangely, I run the same thing today and don't get this error, so if this is not an issue others have, please ignore.

Thanks for looking into this!

Yes indeed, I can replicate these behaviors with badly scaled data due to floating point overflow. For example:

x = rep(sqrt(.Machine$double.xmax) * 10, 2)
# now its 2-norm:
sqrt(drop(crossprod(x)))
[1] Inf

however I have not been able to cook up a toy example that illustrates significant differences between the R and C code paths yet.

In any case, I don't yet have a great solution. Am open to ideas!