CI for g with boundaries
Closed this issue · 6 comments
zhenglei-gao commented
Hi Zhenglei,
Here is the R file for the case where the upper and lower bounds for g are around 0.5. The confidence interval should be NA but is reported as <2e-16.
If I recall correctly, the CI is NA for boundaries [0.499,0.501], and <2e-16 again for [0.4,0.6].
Best regards
Johannes
zhenglei-gao commented
# Trial : TestcasesEvaWaterSediment11040124
# File name : TestcasesEvaWaterSediment11040124 LM DFOP observed.r
# Target path : C:\Emod\KinGUIIv2.1\WorkingDirectory\Testcases Eva WaterSediment
# Created : on 04 Nov 2013
# at 01:53
# by ernom on ADEMONC7868(4CPUs)
# KinGUII version : 2.2013.1028.1144
# Comments :
# Residue File : C:\Emod\KinGUIIv2.1\WorkingDirectory\Testcases Eva WaterSediment\Anglersee Gesamt.txt
# #Anglersee Gesamt
# Check inst. versions of 'KineticEval' and 'R'
if(!require(KineticEval)) stop("Package KineticEval is not available")
if( as.numeric(version$major)<2||(as.numeric(version$major)==2 & as.numeric(version$minor)<15.1) ) stop("Please install R version > 2.15.0")
# Init 'R'
data(versioninfo)
require(methods)
require(logging)
# Create log file
basicConfig(level='FINEST')
addHandler(writeToFile, file="TestcasesEvaWaterSediment11040124 LM DFOP observed.log", level='DEBUG')
#
# Residue data, pathway and kinetics
#
TestcasesEvaWaterSediment11040124 <- try(mkinmod.full(
observed = list(
time = c( 0, 0, 3, 3, 7, 7, 14, 14, 30, 30, 50, 50, 70, 70, 101, 101),
residue = c(100.5, 99.6, 93.8, 93, 90.9, 90.9, 88.3, 88.6, 63.8, 73, 52.5, 28.7, 12.1, 18.1, 9.3, 6.2),
weight = c( 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
sink = TRUE,
type = "DFOP",
k1 = list(ini = 0.01974000,
fixed = 0,
lower = 0.0,
upper = Inf),
k2 = list(ini = 0.0197400,
fixed = 0,
lower = 0.0,
upper = Inf),
g = list(ini = 0.5,
fixed = 0,
lower = 0.49999,
upper = 0.50001),
M0 = list(ini = 104.6,
fixed = 0,
lower = 0.0,
upper = Inf))),silent=TRUE)
#
# Fit and optimizer
#
Fit <- try(KinEval(
TestcasesEvaWaterSediment11040124,
evalMethod = 'IRLS',
optimMethod = 'LM',
plotfit = TRUE,
quiet = TRUE,
ctr = kingui.control(
maxIter = 100,
tolerance = 1E-06,
odesolver = 'lsoda'),
irls.control = list(
maxIter = 10,
tolerance = 0.001)),silent=TRUE)
#
# Output
#
KinReport(TestcasesEvaWaterSediment11040124,
Fit,
version = versioninfo[1,1],
filename = "TestcasesEvaWaterSediment11040124 LM DFOP observed")
#
# End of R-script
#
zhenglei-gao commented
junk <- qr(Fit$hessian)
covar <- ginv(0.5* Fit$hessian)
se <- sqrt(diag(covar) * Fit$ssr/Fit$df.residual)
tval <- Fit$par / se
pt(tval, Fit$df.residual, lower.tail = FALSE)
zhenglei-gao commented
http://pj.freefaculty.org/scraps/profile/qrginv.R
## Paul Johnson
## 2012-12-11
## Searching for faster algorithms to calculate "close enough"
## Moore Penrose Inverses. I initially thought the rasted would
## be based on the 2011 paper by Katsikis, Pappas, and Petralias,
## but I can't get that to work faster than simple svd based
## algorithms in MASS::ginv or Amelia:::mpinv.
## However, while working on that, I found the 2008 Katsikis and
## Pappas function "ginv", which seems to work well, it is easily
## implementedin R.
## There's a lot of numerical linear algebra I don't know, I'm
## just translating from Matlab code in the publications, then
## trying to use R tweaks to make it even faster.
## Function: KPPqrginv.
## R Implementation of Matlab function qrginv in Vasilios N. Katsikis,
## Dimitrios Pappas, Athanassios Petralias. "An improved method for
## the computation of the Moore–Penrose inverse matrix," Applied
## Mathematics and Computation, 2011
##
## Here's the Matlab code from KPP (Katsikis, Pappas, Petralias)
## function qrginv = qrginv(B)
## [N,M] = size(B);
## [Q,R,P] = qr(B);
## r = sum(any(abs(R)>1e-13, 2));
## R1 = R(1:r,: )
## R2 = ginv(R1)
## R3 = [R2 zeros(M,N-r)]
## A = P*R3*Q'
## qrginv = A
## end
## Here is my first effort, which is SLOW SLOW.
## Cautions. 1) Matlab any(, dim) backwards to R intuition
## 2) Matlab returns full Q and R matrices, which R does
## not without complete = TRUE. LAPACK=TRUE needed to
## get pivoting, which is supposed to help.
KPPqrginv <- function(B) {
N <- nrow(B)
M <- ncol(B)
Bqr <- qr(B, LAPACK = TRUE)
Q <- qr.Q(Bqr, complete = TRUE)
## need complete b/c A calculation below
R <- qr.R(Bqr)
P <- diag(length(Bqr$pivot))[, Bqr$pivot]
## R qr.R default removes the rows of all zeros from the bottom.
## columns with nonzero elements in R:
r <- apply(R, 2, function(x) {any(zapsmall(x) != 0)})
R2 <- KPginv(R[ , r]);
R3 <- cbind(R2, matrix(0, nrow=M, ncol = N - nrow(R)))
A <- P %*% R3 %*% t(Q)
}
## Here is my second effort, which is SLOW compared
## to Amelia::mpinv or MASS::ginv.
## Are there tricks to speed it up?
## Eliminate unneeded calculations?
## Previous try followed the Matlab code by right-padding R3 with lots of
## zeros, but that creates huge "waste" in R3. I wonder why it doesn't
## slow down Matlab? I'm using every algorithmic gimmic (tcrossprod).
## Still slow. Results are correct, but SLOW.
KPPqrginv2 <- function(B) {
N <- nrow(B)
M <- ncol(B)
Bqr <- qr(B, LAPACK = TRUE)
Q <- qr.Q(Bqr, complete = TRUE)
## need complete b/c A calculation below
R <- qr.R(Bqr)
P <- diag(length(Bqr$pivot))[ , Bqr$pivot]
r <- apply(R, 2, function(x) {any(zapsmall(x) != 0)})
R2 <- KPginv(R[ , r]) ##MP invert non-zero columns
P %*% tcrossprod(R2, Q[ , 1:ncol(R2) ])
}
## KPginv: Katsikis and Pappas's generalized inverse for square or
## rectangular matrices that have rank = smallest dimension. This is
## used in KPPqrginv. It is also a free-standing function,
## however.
## V. N. Katsikis, D. Pappas, Fast computing of the Moore-Penrose inverse
## matrix, Electronic Journal of Linear Algebra, 17(2008), 637-650.
##
## In the KPP Matlab code for qrginv, that is
## called "ginv", but here I call it KPginv. It is a non-svd based
## method of calculating an M-P generalized inverse for rectangular
## matrices that have rank >= min(dimensions).
##
## Here's that Matlab code:
##
## function ginv = ginv(X)
## if isempty(X)
## quick return
## ginv = zeros(size(X'),class(X));
## return
## end
## [n,m]=size(X);
## if rank(X) < min(n,m);
## error('matrix must be of full rank');
## else
## if n > m,
## C = X'*X ;
## ginv = C\X';
## else
## C = X*X';
## G = C\X;
## ginv = G';
## end
## end
## end
## Function: KPginv:
## The R implementation of K & P's "ginv" function.
##
## Note: Matlab symbol A\B: Solve systems of linear equations Ax = B for x
KPginv <- function(X){
Xdim <- dim(X)
if (Matrix::rankMatrix(X) < min(Xdim)) {
stop('matrix must be of full rank')
} else if (Xdim[1] > Xdim[2]){
C <- crossprod(X)
ginv <- solve(C, t(X))
} else {
C <- tcrossprod(X)
G <- solve(C, X) ## works. But can be faster?
ginv <- t(G)
}
ginv
}
## Question: rankMatrix() takes time! Why bother?. solve() will
## return an error, same effect as checking the rank first. Either
## way, the program crokes. Is doing rank calculation faster? Why?
## solve is going to do it anyway. Could wrap in tryCatch if we want
## program to go on after this fails. Below, the results are verified,
## and this is FAST, much faster than Amelia:::mpinv or MASS::ginv.
KPginv <- function(X){
Xdim <- dim(X)
if (Xdim[1] > Xdim[2]){
C <- crossprod(X)
ginv <- solve(C, t(X))
} else {
C <- tcrossprod(X)
G <- solve(C, X)
ginv <- t(G)
}
ginv
}
## Note: R solve now uses LAPACK by default, which means
## it will "pick up" the BLAS implementation if R is configured
## for it.
X1 <- matrix(c(12,62,93,-8,22,16,2,87,43,91,-4,17,-72,95,6), 3)
X1KPPqrginv <- KPPqrginv(X1)
## verify for correctness
all.equal(X1 %*% X1KPPqrginv %*% X1, X1)
## Check against MASS::ginv result
all.equal(X1KPPqrginv, MASS::ginv(X1))
## Now try KPginv
X1KPging <- KPginv(X1)
X1KPging
all.equal(X1KPging, MASS::ginv(X1))
X2 <- matrix(c(12,41,44,13,41,44, 55,11,44,555,333,555,55,22,55, 553,131,33), ncol=3)
X2inv.KPP <- KPPqrginv(X2)
all.equal(X2 %*% X2inv.KPP %*% X2, X2)
## Verify Same as other implementations
MASS::ginv(X2)
##
Amelia:::mpinv(X2)
##
(X2inv.KP <- KPginv(X2))
X2inv.KPP
all.equal(X2inv.KPP, X2inv.KP, Amelia:::mpinv(X2), MASS::ginv(X2))
## so far, so good.
X3 <- matrix(rnorm(100*10), 100, 10)
X3inv.KPP <- KPPqrginv(X3)
X3inv.KP <- KPginv(X3)
all.equal(X3 %*% X3inv.KPP %*% X3, X3)
all.equal(X3inv.KPP, MASS::ginv(X3))
all.equal(X3inv.KPP, X3inv.KP, MASS::ginv(X3), Amelia:::mpinv(X3) )
## Well, Seems to be correct for these test cases, anyway :)
## Now lets simulate a bunch of matrices and see what is fast.
nReps <- 100
## X symmetric and dense
## Xlist <- vector("list", nReps)
## for(i in 1:nReps){
## x <- matrix(rnorm(1000*100), 1000, 100)
## Xlist[[i]] <- crossprod(x, x)
## }
## X is long and narrow
Xlist <- vector("list", nReps)
for(i in 1:nReps){
x <- matrix(rnorm(1500*20), 1500, 20)
Xlist[[i]] <- x
}
## Test the 2 svd based generalized inverse methods, from
## MASS and Amelia
res2 <- vector("list", nReps)
set.seed(12345)
system.time(
for(i in 1:nReps){
res2[[i]] <- MASS::ginv(Xlist[[i]])
## print(all.equal(X, X %*% as.matrix(res2[[i]]) %*% X))
}
)
res3 <- vector("list", nReps)
set.seed(12345)
system.time(
for(i in 1:nReps){
res3[[i]] <- Amelia:::mpinv(Xlist[[i]])
## print(all.equal(Xlist[[i]], Xlist[[i]] %*% as.matrix(res3[[i]]) %*% Xlist[[i]]))
}
)
## Wow. Look at KPginv. much faster Don't get too excited yet. (Recall
## it has a limited range of restrictions. Basically, can't cope with
## linearly dependent columns, but can deal with a column of all 0's.
## But it can handle any "normal matrix" like X'X.
res4 <- vector("list", nReps)
set.seed(12345)
system.time(
for(i in seq_along(Xlist)){
X <- Xlist[[i]]
res4[[i]] <- KPginv(X)
##print(all.equal(X, X %*% res5[[i]] %*% X))
}
)
## Now watch this. Its horrible.
res5 <- vector("list", nReps)
set.seed(12345)
##Rprof("qrginvpj.out")
system.time(
for(i in seq_along(Xlist)){
X <- Xlist[[i]]
res5[[i]] <- KPPqrginv(X)
##print(all.equal(X, X %*% res4[[i]] %*% X))
}
)
##Rprof(NULL)
## Here's the version of KPP I tricked out. Takes half
## as long as KPPqrginv, but still more than 20 times
## slower than KPginv.
res6<- vector("list", nReps)
set.seed(12345)
PROFILE <- 0
if (PROFILE) Rprof("qrginvpj.out")
system.time(
for(i in seq_along(Xlist)){
X <- Xlist[[i]]
res6[[i]] <- KPPqrginv2(X)
## print(all.equal(X, X %*% res6[[i]] %*% X))
}
)
if (PROFILE){Rprof(NULL)
summaryRprof("qrginvpj.out")
}
## The profile output shows that the bottleneck is
## qr.qy and array.
## $by.self
## self.time self.pct total.time total.pct
## "qr.qy" 9.70 85.09 9.70 85.09
## "array" 1.18 10.35 1.18 10.35
## "tcrossprod" 0.12 1.05 0.12 1.05
## "qr.default" 0.10 0.88 0.12 1.05
## "zapsmall" 0.08 0.70 0.10 0.88
## "%*%" 0.06 0.53 0.06 0.53
## "qr.Q" 0.02 0.18 10.92 95.79
## "FUN" 0.02 0.18 0.12 1.05
## "solve" 0.02 0.18 0.04 0.35
## "$" 0.02 0.18 0.02 0.18
## "colnames" 0.02 0.18 0.02 0.18
## "gc" 0.02 0.18 0.02 0.18
## "log10" 0.02 0.18 0.02 0.18
## "solve.default" 0.02 0.18 0.02 0.18
## My suggestion for the original problem, which was a slow
## running simulation in which the generalized inverse of a
## cross product was needed, is to use KPginv() when it works
## and then fall back to an SVD based calculation when necessary.
## That function is presented in the file prof-1.R
## Create a linearly dependent column
X2.2 <- X2
X2.2 <- cbind(X2, 1.4*X2[ , 2])
KPginv(X2.2)
## fails
## Error in solve.default(C, t(X)) (from #5) :
## system is computationally singular: reciprocal condition number = 1.58953e-17
KPPqrginv(X2.2)
## fails, same
MASS::ginv(X2.2)
## Succeeds
zhenglei-gao commented
> diag(qr.R(junk0))
[1] -1.468760e+02 -2.330847e+02 2.017359e-10
[4] 1.011775e-08
> diag(qr.R(junk))
M0_observed k1_observed k2_observed g_observed
-1.468772e+02 -2.335327e+02 -3.390286e-05 0.000000e+00
> diag(ginv(0.5* Fit$hessian))
[1] 1.492926e+01 4.779877e-04 3.893588e-04 0.000000e+00
> diag(ginv(0.5* Fit0$hessian))
[1] 1.492637e+01 1.241861e-03 1.090686e-03 4.828092e-14
>
> junk0$qraux
[1] 1.001318e+00 1.715898e+00 2.000000e+00 1.011775e-08
> junk$qraux
[1] 1.001318 1.713860 2.000000 0.000000
zhenglei-gao commented
Issue an warning.
zhenglei-gao commented
a general issue need to be solved later.