Meta Algorithm

A universal meta algorithm for machine learning projects - executed by myself.

Step 0: Define the problem

Define input and output
Decide if classification or regression
Decide if supervised or unsupervised
Define evaluation metric

Step 1: Collect data

Define the population
Choose a kind of study (experiment or survey)
If survey: define sampling method
If experiment: define assignment method / kind of manipulation and control for disruptive factor
Choose sample size

Step 2: Analyze data

Take a look at the shape
Take a quick glance
Analyze the most important statistics of the variables (mean, meadian, variance, missing values)
Analyze each variable in depth: statistics, distribution
Analyze relationships: scatterplot matrix with correlation coefficient

Step 4: Select Features

Step 5: Clean data

Analyze columns for missing values and outliers -> drop column / replace with values / do nothing
Analyze rows for missing values and outliers -> drop row / replace with values / do nothing

Step 8: Data augmentation (optional)

Step 9: Feature engineering (optional)

Step 10: Dummy encoding

Identify categorical non-ordinal features
Create dummy variables for those features
Drop original features

Step 10: Feature scaling (optional)

Identify skewed variables
Take log of those variables
Standardize or normalize features

Step 11: Compress data with PCA (optional)

Choose number of components
Fit components
Interpret components
Compress data

Step 12: Apply Kernelfunction (optional)

Step 13: Split data

Training data: 70%
Validation data: 15%
Test data: 15%

Step 14: Choose model, loss-function, learning-algorithm

Choose Model
Supervised regression: linear model
Supervised classification: logistic model, svm, decision trees, naive bayes, neural network
Unsupervised clustering: KMeans, Hierachical Clustering, DBScan, Gaussian Mixture Model
Choose loss-function
Choose learning-algorithm

Step 15: Choose hyperparameters:

Choose hyperparameters of model
Choose hyperparameters of loss-function
Choose hyperparameters of learning-algorithm
Choose sample size

Step 16: Fit model

Step 17: Evaluate model

1.Choose metric 1.Classification: accuracy, precision, recall, F-Score, loss 1.Regression: (adjusted) correlation coefficient, sum of squared resiudal 1.Clustering: adjusted rand score, silhouette coefficient 2.Evaluate model on training and cross-validation set 3.Check bias/underfitting and variance/overfitting

moritztng/meta-algorithm

Meta Algorithm

Step 0: Define the problem

Step 1: Collect data

Step 2: Analyze data

Step 4: Select Features

Step 5: Clean data

Step 8: Data augmentation (optional)

Step 9: Feature engineering (optional)

Step 10: Dummy encoding

Step 10: Feature scaling (optional)

Step 11: Compress data with PCA (optional)

Step 12: Apply Kernelfunction (optional)

Step 13: Split data

Step 14: Choose model, loss-function, learning-algorithm

Step 15: Choose hyperparameters:

Step 16: Fit model

Step 17: Evaluate model

Go to: Step 15

Step 18: Choose hyperparameter with best cross-validation score

Go to: Step 14

Step 19: Choose model, loss-function and learning-algorithm with best cross-validation score

Step 20: Evaluate model on test data

Step 21: Sanity check model by inferencing on random example (optional)

Step 22: Interpret model (optional)

Step 23: Deploy / save model