/MADELON

Binary classification problem on highly non-linear data set

Primary LanguageJupyter Notebook

MADELON

MADELON is an artificial dataset containing data points grouped in 32 clusters placed on the vertices of a five dimensional hypercube and randomly labeled +1 or -1. The five dimensions constitute 5 informative features. 15 linear combinations of those features were added to form a set of 20 (redundant) informative features. Based on those 20 features one must separate the examples into the 2 classes (corresponding to the +-1 labels). We added a number of distractor feature called 'probes' having no predictive power. The order of the features and patterns were randomized.

madelondataset

Number of variables/features/attributes:

Real: 20

Probes: 480

Total: 500

http://archive.ics.uci.edu/ml/datasets/madelon


some results:

Madelon_LDA_table Madelon_LDA_chart Madelon_hypercube


Madelon_SVM_table Madelon_SVM_chart


Madelon_PCA_SVM_table Madelon_PCA_SVM_summarytable Madelon_PCA_SVM_chart


Madelon_besttable Madelon_best_chart