malucalle/selbal

Error with more than 2 variables in the balance

Opened this issue · 6 comments

Dear selbal developers,

when I run selbal.cv() on my datasets, it runs fine only when 2 variables are selected in the balance.
Once I run the same code forcing to select more variables (with opt.cri or opt.nvar parameters),
I get the following error after the cross-validation procedure finished (and the optimal number of variables is printed) :
Error in [.data.frame(LogCounts, , c(POS, x)) :
undefined columns selected

I am not sure whether this also happens for >2 variables selected using default settings, because for my data coincidently always only 2 variables are then selected.

Many thanks in advance for your help!

Created a pull requests with fixes that worked for my datasets.
"Small fixes to selbal() and selbal.aux() ensuring variables are drawn from logCounts."
Cheers!

Thank you for your support @ChVav! I saw your proposals for changes, but I really don't konw how to accept them, I am not familiar with GitHub, the only thing I know is to correct it manually.

Do you know how can I accept your proposals?

No problem!
Here is described how to merge a pull request: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/incorporating-changes-from-a-pull-request/merging-a-pull-request-with-a-merge-queue.
If you still have a local repository with your code, you can then do a "git pull" to ensure changes in the remote github repo are copied over locally.
Hope this helps!

I downloaded the code modified by ChVav, but I am still receiving errors while trying it on the HIV data file. I am using R 4.3.3 on Windows 10. I should mentioned that the code doesn't work with the library neither.

my code:

source("C:/Users/xxx/OneDrive -xxx/Methods/selbal/selbal_functions.R")

Define x, y and z

x <- selbal::HIV[,1:60]
y <- selbal::HIV[,62]
z <- data.frame(MSM = selbal::HIV[,61])

Run selbal.cv function (with the default values for zero.rep and opt.cri)

CV.BAL.dic <- selbal.cv(x = x, y = y, n.fold = 5, n.iter = 10, covar = z, logit.acc = "AUC")

###############################################################
STARTING selbal.cv FUNCTION
###############################################################

#-------------------------------------------------------------#

ZERO REPLACEMENT . . .

Loading required package: MASS
Loading required package: NADA
Loading required package: survival

Attaching package: ‘NADA’

The following object is masked from ‘package:stats’:

cor

Loading required package: truncnorm

, . . . FINISHED.
#-------------------------------------------------------------#

#-------------------------------------------------------------#

Starting the cross - validation procedure . . .

. . . finished.
#-------------------------------------------------------------#
###############################################################

The optimal number of variables is: 4

Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 't': undefined columns selected
In addition: Warning messages:
1: In cmultRepl(x, suppress.print = T) :
Column no. 49 containing >80% zeros/unobserved values deleted (see arguments z.warning and z.delete).
Column no. 53 containing >80% zeros/unobserved values deleted (see arguments z.warning and z.delete).

2: In e$fun(obj, substitute(ex), parent.frame(), e$data) :
already exporting variable(s): logit.acc
3: In cmultRepl(x, suppress.print = T) :
Column no. 49 containing >80% zeros/unobserved values deleted (see arguments z.warning and z.delete).
Column no. 53 containing >80% zeros/unobserved values deleted (see arguments z.warning and z.delete).

I've found that the change below can fix the problem, and it is on line 1357 in selbal_functions.R posted by ChVav.
var.nam <- rem.nam <- colnames(x)
to
var.nam <- rem.nam <- colnames(logCounts)

Thank you @bl6594! Change done