/Probability-and-Statistics-for-Machine-Learning

Probability and Statistics for Machine Learning

Primary LanguageJupyter Notebook

Probability and Statistics for Machine Learning

This repo is home to the code that accompanies Probability and Statistics for Machine Learning curriculum, which provides a comprehensive overview of all of the subjects — across Probability and Statistics — that underlie contemporary machine learning approaches, including deep learning and other artificial intelligence techniques.

Probability Curriculum

  • Segment 1: Introduction to Probability

    • What Probability Theory Is
    • A Brief History: Frequentists vs Bayesians
    • Applications of Probability to Machine Learning
    • Random Variables
    • Discrete vs Continuous Variables
    • Probability Mass and Probability Density Functions
    • Expected Value
    • Measures of Central Tendency: Mean, Median, and Mode
    • Quantiles: Quartiles, Deciles, and Percentiles
    • The Box-and-Whisker Plot
    • Measures of Dispersion: Variance, Standard Deviation, and Standard Error
    • Measures of Relatedness: Covariance and Correlation
    • Marginal and Conditional Probabilities
    • Independence and Conditional Independence
  • Segment 2: Distributions in Machine Learning

    • Uniform
    • Gaussian: Normal and Standard Normal
    • The Central Limit Theorem
    • Log-Normal
    • Exponential and Laplace
    • Binomial and Multinomial
    • Poisson
    • Mixture Distributions
    • Preprocessing Data for Model Input
  • Segment 3: Information Theory

    • What Information Theory Is
    • Self-Information
    • Nats, Bits and Shannons
    • Shannon and Differential Entropy
    • Kullback-Leibler Divergence
    • Cross-Entropy

Statistics Curriculum

  • Segment 1: Frequentist Statistics

    • Frequentist vs Bayesian Statistics
    • Review of Relevant Probability Theory
    • z-scores and Outliers
    • p-values
    • Comparing Means with t-tests
    • Confidence Intervals
    • ANOVA: Analysis of Variance
    • Pearson Correlation Coefficient
    • R-Squared Coefficient of Determination
    • Correlation vs Causation
    • Correcting for Multiple Comparisons
  • Segment 2: Regression

    • Features: Independent vs Dependent Variables
    • Linear Regression to Predict Continuous Values
    • Fitting a Line to Points on a Cartesian Plane
    • Ordinary Least Squares
    • Logistic Regression to Predict Categories
  • Segment 3: Bayesian Statistics

    • (Deep) ML vs Frequentist Statistics
    • When to use Bayesian Statistics
    • Prior Probabilities
    • Bayes’ Theorem
    • PyMC3 Notebook
    • Resources for Further Study of Probability and Statistics