rdiaz02/varSelRF

variable names in importance plots

Opened this issue · 0 comments

Asked for by Xiaowei Guan: " there are not variable names only the dots".
My answer back then:
Right now, no, there is no direct way to get those names in the plot.
Let me ellaborate:

a) In the second plot (OOB error vs. number of variables) that would
not make sense (since its not individual variables that are plotted).

b) In the first plot, we are plotting just the importances of the very
first forest. That is something you could get from the usual random
forest, as you show.

c) b) is really not a very statisfactory answer. It should be easy to
modify the code of plot.varSelRF, because we just do a simple call

> varSelRF:::plot.varSelRF
function (x, nvar = NULL, which = c(1, 2), ...)
{
   if (length(which) == 2 && dev.interactive()) {
       op <- par(ask = TRUE, las = 1)
   }
   else {
       op <- par(las = 1)
   }
   on.exit(par(op))
   if (is.null(nvar))
       nvar <- min(30, length(x$initialOrderedImportances))
   show <- c(FALSE, FALSE)
   show[which] <- TRUE
   if (show[1]) {
       dotchart(rev(x$initialOrderedImportances[1:nvar]), 
                           main = "Initial importances",
                           xlab = "Importances (unscaled)")
   }
   if (show[2]) {
       ylim <- c(0, max(0.5, x$selec.history$OOB))
       plot(x$selec.history$Number.Variables, x$selec.history$OOB,
           type = "b", xlab = "Number of variables used", ylab = "OOB error",
           log = "x", ylim = ylim, ...)
       lines(x$selec.history$Number.Variables, x$selec.history$OOB +
           2 * x$selec.history$sd.OOB, lty = 2)
       lines(x$selec.history$Number.Variables, x$selec.history$OOB -
           2 * x$selec.history$sd.OOB, lty = 2)
   }
}

So we could just modify the call to dotchart, to add labels. For now, however, you
can either use random forest directly, or modify the code.