Selection bias problem

We create two main programs, in which simulate the selection bias problem with the random forest algorithm. In the mainGood we do it in a good way, the whole pipiline would be like this:

Create dataset
Generate k-folding index
Divide in train and test
Feature selection
Random forest with training data
Predict with testing data
Visualize data

And the second one, we do it in a bad way, doing the feature selection before the kfolding:

Create dataset
Feature selection
Generate k-folding index
Divide in train and test
Random forest with training data
Predict with testing data
Visualize data

========

Authors: Claudia Buhigas, Pablo Vicente and Jose Alejandro Romero

MSc Bioinformatics 2014-2015 - R & statistics exercise

Presentation - https://docs.zoho.com/show/publish/pnmm5185d5e8a3ce8459e9903c0954bdfb2d9

Pablo1990/RPrac

Selection bias problem