zhenglei-gao/KineticEval

CI for g with boundaries

Closed this issue · 6 comments

Hi Zhenglei,

Here is the R file for the case where the upper and lower bounds for g are around 0.5. The confidence interval should be NA but is reported as <2e-16.
If I recall correctly, the CI is NA for boundaries [0.499,0.501], and <2e-16 again for [0.4,0.6].

Best regards
Johannes

# Trial           : TestcasesEvaWaterSediment11040124
# File name       : TestcasesEvaWaterSediment11040124 LM DFOP observed.r
# Target path     : C:\Emod\KinGUIIv2.1\WorkingDirectory\Testcases Eva WaterSediment
# Created         : on 04 Nov 2013
#                   at 01:53
#                   by ernom on ADEMONC7868(4CPUs)
# KinGUII version : 2.2013.1028.1144
# Comments        : 
# Residue File    : C:\Emod\KinGUIIv2.1\WorkingDirectory\Testcases Eva WaterSediment\Anglersee Gesamt.txt
# #Anglersee Gesamt 

# Check inst. versions of 'KineticEval' and 'R'
if(!require(KineticEval)) stop("Package KineticEval is not available")
if( as.numeric(version$major)<2||(as.numeric(version$major)==2 & as.numeric(version$minor)<15.1) ) stop("Please install R version > 2.15.0")


# Init 'R'
data(versioninfo)
require(methods)
require(logging)


# Create log file
basicConfig(level='FINEST')
addHandler(writeToFile, file="TestcasesEvaWaterSediment11040124 LM DFOP observed.log", level='DEBUG')


#
# Residue data, pathway and kinetics
#

 TestcasesEvaWaterSediment11040124 <- try(mkinmod.full(
   observed = list(
       time = c(    0,     0,     3,     3,     7,     7,    14,    14,    30,    30,    50,    50,    70,    70,   101,   101),
    residue = c(100.5,  99.6,  93.8,    93,  90.9,  90.9,  88.3,  88.6,  63.8,    73,  52.5,  28.7,  12.1,  18.1,   9.3,   6.2),
     weight = c(    1,     1,     1,     1,     1,     1,     1,     1,     1,     1,     1,     1,     1,     1,     1,     1),
                   sink  = TRUE,
       type = "DFOP",
         k1 = list(ini   = 0.01974000,
                   fixed = 0,
                   lower = 0.0,
                   upper = Inf),
         k2 = list(ini   = 0.0197400,
                   fixed = 0,
                   lower = 0.0,
                   upper = Inf),
          g = list(ini   = 0.5,
                   fixed = 0,
                   lower = 0.49999,
                   upper = 0.50001),
         M0 = list(ini   = 104.6,
                   fixed = 0,
                   lower = 0.0,
                   upper = Inf))),silent=TRUE)

#
# Fit and optimizer
#

  Fit    <- try(KinEval(
              TestcasesEvaWaterSediment11040124,
              evalMethod = 'IRLS',
             optimMethod = 'LM',
               plotfit   = TRUE,
               quiet     = TRUE,
               ctr       = kingui.control(
                              maxIter = 100,
                            tolerance = 1E-06,
                            odesolver = 'lsoda'),
            irls.control = list(
                              maxIter = 10,
                            tolerance = 0.001)),silent=TRUE)

#
# Output
#


 KinReport(TestcasesEvaWaterSediment11040124,
            Fit,
            version = versioninfo[1,1],
            filename = "TestcasesEvaWaterSediment11040124 LM DFOP observed")


#
# End of R-script
#


junk <- qr(Fit$hessian)
covar <- ginv(0.5* Fit$hessian)
se <- sqrt(diag(covar) * Fit$ssr/Fit$df.residual)
tval      <- Fit$par / se
pt(tval, Fit$df.residual, lower.tail = FALSE)

http://pj.freefaculty.org/scraps/profile/qrginv.R

## Paul Johnson
## 2012-12-11

## Searching for faster algorithms to calculate "close enough"
## Moore Penrose Inverses. I initially thought the rasted would
## be based on the 2011 paper by Katsikis, Pappas, and Petralias,
## but I can't get that to work faster than simple svd based
## algorithms in MASS::ginv or Amelia:::mpinv.

## However, while working on that, I found the 2008 Katsikis and
## Pappas function "ginv", which seems to work well, it is easily
## implementedin R.

## There's a lot of numerical linear algebra I don't know, I'm
## just translating from Matlab code in the publications, then
## trying to use R tweaks to make it even faster.

## Function: KPPqrginv.
## R Implementation of Matlab function qrginv in Vasilios N. Katsikis,
## Dimitrios Pappas, Athanassios Petralias.  "An improved method for
## the computation of the Moore–Penrose inverse matrix," Applied
## Mathematics and Computation, 2011
##
## Here's the Matlab code from KPP (Katsikis, Pappas, Petralias)
## function qrginv = qrginv(B)
## [N,M] = size(B);
## [Q,R,P] = qr(B);
## r = sum(any(abs(R)>1e-13, 2));
## R1 = R(1:r,: )
## R2 = ginv(R1)
## R3 = [R2 zeros(M,N-r)]
## A = P*R3*Q'
## qrginv = A
## end


## Here is my first effort, which is SLOW SLOW.
## Cautions. 1) Matlab any(, dim) backwards to R intuition
## 2) Matlab returns full Q and R matrices, which R does
## not without complete = TRUE. LAPACK=TRUE needed to
## get pivoting, which is supposed to help.
KPPqrginv <- function(B) {
    N <- nrow(B)
    M <- ncol(B)
    Bqr <- qr(B, LAPACK = TRUE)
    Q <- qr.Q(Bqr, complete = TRUE)
    ## need complete b/c A calculation below
    R <- qr.R(Bqr)
    P <- diag(length(Bqr$pivot))[, Bqr$pivot]
    ## R qr.R default removes the rows of all zeros from the bottom.
    ## columns with nonzero elements in R:
    r <- apply(R, 2, function(x) {any(zapsmall(x) != 0)})
    R2 <- KPginv(R[ , r]);
    R3 <- cbind(R2, matrix(0, nrow=M, ncol = N - nrow(R)))
    A <- P %*% R3 %*% t(Q)
}

## Here is my second effort, which is SLOW compared
## to Amelia::mpinv or MASS::ginv.
## Are there tricks to speed it up?
## Eliminate unneeded calculations?
## Previous try followed the Matlab code by right-padding R3 with lots of
## zeros, but that creates huge "waste" in R3.  I wonder why it doesn't
## slow down Matlab? I'm using every algorithmic gimmic (tcrossprod).
## Still slow. Results are correct, but SLOW.

KPPqrginv2 <- function(B) {
    N <- nrow(B)
    M <- ncol(B)
    Bqr <- qr(B, LAPACK = TRUE)
    Q <- qr.Q(Bqr, complete = TRUE)
    ## need complete b/c A calculation below
    R <- qr.R(Bqr)
    P <- diag(length(Bqr$pivot))[ , Bqr$pivot]

    r <- apply(R, 2, function(x) {any(zapsmall(x) != 0)})
    R2 <- KPginv(R[ , r]) ##MP invert non-zero columns
    P %*% tcrossprod(R2, Q[ , 1:ncol(R2) ])
}



## KPginv: Katsikis and Pappas's generalized inverse for square or
## rectangular matrices that have rank = smallest dimension. This is
## used in KPPqrginv. It is also a free-standing function,
## however.
## V. N. Katsikis, D. Pappas, Fast computing of the Moore-Penrose inverse
## matrix, Electronic Journal of Linear Algebra, 17(2008), 637-650.
##
## In the KPP Matlab code for qrginv, that is
## called "ginv", but here I call it KPginv. It is a non-svd based
## method of calculating an M-P generalized inverse for rectangular
## matrices that have rank >= min(dimensions).
##
## Here's that Matlab code:
##
## function ginv = ginv(X)
## if isempty(X)
## quick return
## ginv = zeros(size(X'),class(X));
## return
## end
## [n,m]=size(X);
## if rank(X) < min(n,m);
## error('matrix must be of full rank');
## else
## if n > m,
## C = X'*X ;
## ginv = C\X';
## else
## C = X*X';
## G = C\X;
## ginv = G';
## end
## end
## end

## Function: KPginv:
## The R implementation of K & P's "ginv" function.
##
## Note: Matlab symbol A\B: Solve systems of linear equations Ax = B for x

KPginv <- function(X){
    Xdim <- dim(X)
    if (Matrix::rankMatrix(X) < min(Xdim)) {
        stop('matrix must be of full rank')
    } else if (Xdim[1] > Xdim[2]){
        C <- crossprod(X)
        ginv <- solve(C, t(X))
    } else {
        C <- tcrossprod(X)
        G <- solve(C, X) ## works. But can be faster?
        ginv <- t(G)
    }
    ginv
}

## Question: rankMatrix() takes time! Why bother?.  solve() will
## return an error, same effect as checking the rank first. Either
## way, the program crokes. Is doing rank calculation faster? Why?
## solve is going to do it anyway.  Could wrap in tryCatch if we want
## program to go on after this fails.  Below, the results are verified,
## and this is FAST, much faster than Amelia:::mpinv or MASS::ginv.
KPginv <- function(X){
    Xdim <- dim(X)
    if (Xdim[1] > Xdim[2]){
        C <- crossprod(X)
        ginv <- solve(C, t(X))
    } else {
        C <- tcrossprod(X)
        G <- solve(C, X)
        ginv <- t(G)
    }
    ginv
}

## Note: R solve now uses LAPACK by default, which means
## it will "pick up" the BLAS implementation if R is configured
## for it.


X1 <- matrix(c(12,62,93,-8,22,16,2,87,43,91,-4,17,-72,95,6), 3)
X1KPPqrginv <- KPPqrginv(X1)
## verify for correctness
all.equal(X1 %*% X1KPPqrginv %*% X1, X1)
## Check against MASS::ginv result
all.equal(X1KPPqrginv, MASS::ginv(X1))

## Now try KPginv
X1KPging <- KPginv(X1)
X1KPging
all.equal(X1KPging, MASS::ginv(X1))


X2 <- matrix(c(12,41,44,13,41,44, 55,11,44,555,333,555,55,22,55, 553,131,33), ncol=3)
X2inv.KPP <- KPPqrginv(X2)
all.equal(X2 %*% X2inv.KPP %*% X2, X2)
## Verify Same as other implementations
MASS::ginv(X2)
##
Amelia:::mpinv(X2)
##
(X2inv.KP <- KPginv(X2))
X2inv.KPP
all.equal(X2inv.KPP, X2inv.KP, Amelia:::mpinv(X2), MASS::ginv(X2))
## so far, so good.

X3 <- matrix(rnorm(100*10), 100, 10)
X3inv.KPP <- KPPqrginv(X3)
X3inv.KP <- KPginv(X3)
all.equal(X3 %*% X3inv.KPP %*% X3, X3)
all.equal(X3inv.KPP, MASS::ginv(X3))
all.equal(X3inv.KPP, X3inv.KP, MASS::ginv(X3), Amelia:::mpinv(X3) )


## Well, Seems to be correct for these test cases, anyway :)

## Now lets simulate a bunch of matrices and see what is fast.
nReps <- 100

## X symmetric and dense
## Xlist <- vector("list", nReps)
## for(i in 1:nReps){
##     x <- matrix(rnorm(1000*100), 1000, 100)
##     Xlist[[i]] <- crossprod(x, x)
## }

## X is long and narrow
Xlist <- vector("list", nReps)
for(i in 1:nReps){
    x <- matrix(rnorm(1500*20), 1500, 20)
    Xlist[[i]] <- x
}


## Test the 2 svd based generalized inverse methods, from
## MASS and Amelia

res2 <- vector("list", nReps)
set.seed(12345)
system.time(
    for(i in 1:nReps){
        res2[[i]] <- MASS::ginv(Xlist[[i]])
      ##  print(all.equal(X, X %*% as.matrix(res2[[i]]) %*% X))
    }
    )



res3 <- vector("list", nReps)
set.seed(12345)
system.time(
    for(i in 1:nReps){
        res3[[i]] <- Amelia:::mpinv(Xlist[[i]])
        ##  print(all.equal(Xlist[[i]], Xlist[[i]] %*% as.matrix(res3[[i]]) %*% Xlist[[i]]))
    }
    )



## Wow. Look at KPginv. much faster Don't get too excited yet. (Recall
## it has a limited range of restrictions. Basically, can't cope with
## linearly dependent columns, but can deal with a column of all 0's.
## But it can handle any "normal matrix" like X'X.
res4 <- vector("list", nReps)
set.seed(12345)
system.time(
    for(i in seq_along(Xlist)){
        X <- Xlist[[i]]
        res4[[i]] <- KPginv(X)
        ##print(all.equal(X, X %*% res5[[i]] %*% X))
    }
    )



## Now watch this. Its horrible.
res5 <- vector("list", nReps)
set.seed(12345)
##Rprof("qrginvpj.out")
system.time(
    for(i in seq_along(Xlist)){
        X <- Xlist[[i]]
        res5[[i]] <- KPPqrginv(X)
        ##print(all.equal(X, X %*% res4[[i]] %*% X))
    }
     )
##Rprof(NULL)



## Here's the version of KPP I tricked out. Takes half
## as long as KPPqrginv, but still more than 20 times
## slower than KPginv.
res6<- vector("list", nReps)
set.seed(12345)
PROFILE <- 0
if (PROFILE) Rprof("qrginvpj.out")
system.time(
    for(i in seq_along(Xlist)){
        X <- Xlist[[i]]
        res6[[i]] <- KPPqrginv2(X)
        ## print(all.equal(X, X %*% res6[[i]] %*% X))
    }
    )
if (PROFILE){Rprof(NULL)
             summaryRprof("qrginvpj.out")
         }

## The profile output shows that the bottleneck is
## qr.qy and array.

## $by.self
##                 self.time self.pct total.time total.pct
## "qr.qy"              9.70    85.09       9.70     85.09
## "array"              1.18    10.35       1.18     10.35
## "tcrossprod"         0.12     1.05       0.12      1.05
## "qr.default"         0.10     0.88       0.12      1.05
## "zapsmall"           0.08     0.70       0.10      0.88
## "%*%"                0.06     0.53       0.06      0.53
## "qr.Q"               0.02     0.18      10.92     95.79
## "FUN"                0.02     0.18       0.12      1.05
## "solve"              0.02     0.18       0.04      0.35
## "$"                  0.02     0.18       0.02      0.18
## "colnames"           0.02     0.18       0.02      0.18
## "gc"                 0.02     0.18       0.02      0.18
## "log10"              0.02     0.18       0.02      0.18
## "solve.default"      0.02     0.18       0.02      0.18







## My suggestion for the original problem, which was a slow
## running simulation in which the generalized inverse of a
## cross product was needed, is to use KPginv() when it works
## and then fall back to an SVD based calculation when necessary.
## That function is presented in the file prof-1.R


## Create a linearly dependent column
X2.2 <- X2
X2.2 <- cbind(X2, 1.4*X2[ , 2])

KPginv(X2.2)
## fails
## Error in solve.default(C, t(X)) (from #5) :
##  system is computationally singular: reciprocal condition number = 1.58953e-17

KPPqrginv(X2.2)
## fails, same

MASS::ginv(X2.2)
## Succeeds

http://pj.freefaculty.org/scraps/profile/qrginv.R

> diag(qr.R(junk0))
[1] -1.468760e+02 -2.330847e+02  2.017359e-10
[4]  1.011775e-08
> diag(qr.R(junk))
  M0_observed   k1_observed   k2_observed    g_observed 
-1.468772e+02 -2.335327e+02 -3.390286e-05  0.000000e+00 
> diag(ginv(0.5* Fit$hessian))
[1] 1.492926e+01 4.779877e-04 3.893588e-04 0.000000e+00
> diag(ginv(0.5* Fit0$hessian))
[1] 1.492637e+01 1.241861e-03 1.090686e-03 4.828092e-14
> 
> junk0$qraux
[1] 1.001318e+00 1.715898e+00 2.000000e+00 1.011775e-08
> junk$qraux
[1] 1.001318 1.713860 2.000000 0.000000

Issue an warning.

a general issue need to be solved later.