I got... nrow(x) > 1 is not TRUE
Closed this issue · 6 comments
I get...
Error in { : task 1 failed - "nrow(x) > 1 is not TRUE"
...any ideas?
Thx a lot.
Provide the data so I can replicate the error
For convenience I changed the data a little:
data <- data.frame(
"col1"=c(1,2,3,4),
"col2"=c(6,8,9,10),
"col3"=c(11,12,13,14),
"col4"=c(15,16,14,18)
)
# Data preparation
data2=na.omit(data) # <- use with care...
data_y=as.factor(data2$col1)
data_x=select(data2, col2, col3, col4)
# GA parameters
param_nBits=ncol(data_x)
col_names=colnames(data_x)
[...]
Now I get: Error in { :
task 1 failed - "One or more factor levels in the outcome has no data: '1'"
By the way, the illustration is ingenious.
So few cases, that will not work even for the fitness function (you can try your data in the classifier first). In addition, the script optimizes the ROC AUC value, which is used for binary classification (your example has 4 levels. Check your data inside the custom_fitness
.
The illustration comes from: https://www.goatstream.com/research/papers/SA2013/ (and yes, it's amazing)
The few cases are just for illustration. Hm, seems I should transform the data into a classification problem. Is this right? For example with a cut()-function?...
[...]
Convert a polynomial column to an additional binary feature:
df<-subset(data,select=c("col1"))
catCol1<-cut(df$col1,breaks=c(1,2,3,4),labels=c("A","B","C"),right=FALSE)
res=data.frame("orig"=as.vector(df$col1),"cat"=as.vector(catCol1))
myVar <- paste0("IS_A_CAT")
df1<-res %>% mutate((!!as.name(myVar)):=ifelse(cat=="A",1,0))
[...]
What do you mean with "cusotm_function" and "in the classifier"?
If data_y should be a binary factor, what formats should/could data_x contain?
Sorry, the function is custom_fitness
, inside of it, you can find: get_roc_metric
. It calculates the fitness value. You can find it here: https://github.com/pablo14/genetic-algorithm-feature-selection/blob/master/lib_ga.R
As you can see, it is using caret
package behind. If you want to adjust to multi-class, or regression, you have to change this function. You can put whatever function you want in that place (that's why I leave the code ready to be changed).
And yes, you have to transform the data into classification, binary-class.
Before applying GA, you should run get_roc_metric
with your data and see if it ends ok (returning the metric value you want, ROC
by default.
Is it more clear now? I made it clearer in the repo: https://github.com/pablo14/genetic-algorithm-feature-selection#how-to-run-the-example.
OK.