We create two main programs, in which simulate the selection bias problem with the random forest algorithm. In the mainGood we do it in a good way, the whole pipiline would be like this:
- Create dataset
- Generate k-folding index
- Divide in train and test
- Feature selection
- Random forest with training data
- Predict with testing data
- Visualize data
And the second one, we do it in a bad way, doing the feature selection before the kfolding:
- Create dataset
- Feature selection
- Generate k-folding index
- Divide in train and test
- Random forest with training data
- Predict with testing data
- Visualize data
========
Authors: Claudia Buhigas, Pablo Vicente and Jose Alejandro Romero
MSc Bioinformatics 2014-2015 - R & statistics exercise
Presentation - https://docs.zoho.com/show/publish/pnmm5185d5e8a3ce8459e9903c0954bdfb2d9