Colname errors
qianchd opened this issue · 1 comments
Describe the bug
In some extreme case, the colnames can be duplicated. For example colnames=("", "", "", "") or ("X", "X", "X", "X"). It makes the functionpredict.abess
fail when it tries to run newx <- newx[, vn]
. However the main fitting function abess::abess
still pass.
Code for Reproduction
n=100
p=50
b=c(1,1,rep(0, p - 2))
X = matrix(rnorm(n * p), n, p)
y <- X %*% b + rnorm(n)
colnames(X) <- rep("", p)
md <- abess::abess(X, y) # it passes the test
predict(md, newx=X[1:10, ]) # Error in newx[, vn] : subscript out of bounds
A clear and concise description of what you expected to happen.
The colnames of X
need to be checked in the abess::abess function. If the colnames are duplicated, either a error should be raised or the matrix X
should be treated as the unamed matrix (colnames(X) == NULL
). Alternative choice is to check if the rownames set rownames(object[["beta"]])
is a subset of the colnames of newx
which improves the line if (!is.null(colnames(newx)))
in the predict.abess
function.
Thanks for your valuable feedback, and I'm pleased to incorporate one of your suggestions into the program.
abess
prioritizes using column names rather than positions for identification so that duplicated names will cause confusion. We plan to add the following checks to avoid this situation:
if (length(unique(para$vn)) != length(para$vn)) {
stop("The colnames of x are duplicated!")
}
Once again, thank you for your assistance.