rfvs-regression

The goal of rfvs-regression is to compare random forest variable selection techniques for continuous outcomes

Datasets included

The experiment we ran involves three steps:

Note: We use both axis-based and oblique random forests for regression in step 2.

We assume that better variable selection leads to better prediction accuracy. Results from the experiment are below.

Similar results, but using mean and standard error

Takeaways:

Variable selection method jiang has consistently high prediction accuracy but scales poorly to large datasets in terms of computational efficiency. E.g., the max time that this method required was 2 days, 23 hours, 40 minutes, 9 seconds
Variable selection method aorsf has very high prediction accuracy when the downstream model is an oblique random forest, and does not become computationally intractable with larger data. E.g., the max time this method required was 1 hour, 18 minutes, 26 seconds