Applied Predictive Modeling with caret
Create synthetic data using twoClassSim
Quickly explore the data using skimr
and xray
Split the dataset into train/test with an index
Setting up train control in caret
Cross-validation method and settings
Subsampling to deal with class-imbalance (mentioned but not implemented)
Placeholder regression example
Classification example
Logistic Regression (glm
), Elastic Net (glmnet
), Random Forest (ranger
)
Using summary, variable importance, plot on fit object
Prediction on unseen data: class; class probability
In-sample: ROC, Sensitivity (true positive rate), Specificity (true negative rate)
Confusion matrices
Model dissimilarity using Jaccard distance
Linear ensembles
Meta-Model ensembles
Recursive Feature Elimination
Simulated Annealing
Genetic Algorithm