/hse-applied-statistics-2018

Materials for HSE course "Applied Statistics in Machine Learning" taught during 2018.

Primary LanguageJupyter Notebook

Applied Statistics in Machine Learning

  • Dates: Fall 2017, Fall 2018
  • Venue: Higher School of Economics, Moscow, Russia
  • Level: Bachelor Students
  • Program: Machine Learning

Authors:

  • Alexey Artemov
  • Denis Derkach
  • Maxim Sharaev
  • Ekaterina Kondratyeva
  • Vlad Belavin

Course Contents

This course provides an exploration of essential statistical concepts and methods crucial for data analysis across various scientific disciplines. We will start with the fundamental principles of resampling techniques such as Monte Carlo simulation and Bootstrap and how these methods underpin modern statistical inference. Next, we will cover parametric estimation methods, hypothesis testing, nonparametric estimation techniques, regression analysis, and the design of experiments. We will further explore advanced topics including distances between distributions, hypothesis testing methodologies like Neyman-Pearson lemma and A/B testing, as well as nonparametric criteria. Practical applications in fields such as neuroscience will be highlighted, demonstrating the relevance and applicability of statistical methods in real-world scenarios. By the end of the course, students will have acquired a robust foundation in statistical theory and methodology, enabling them to critically analyze data, make informed decisions, and contribute effectively to research and problem-solving endeavors in their respective fields. This course not only equips students with essential statistical tools but also fosters a deeper appreciation for the role of statistics in advancing scientific understanding and discovery.

  • Lecture 1. Course introduction.
  • Lecture 2. Resampling. Monte Carlo simulation. Bootstrap. Confidence intervals. Multiple comparisons correction. Bagging in machine learning.
  • Lecture 3. Parametric estimation. Maximum likelihood method and its properties. Delta method. The case of vector parameter.
  • Lecture 4. Distances between distributions. f-divergence distances. The distance of total variation. Kulbak-Leibler distance. Jensen-Shannon distance. χ2 distance. Wasserstein distance.
  • Lecture 5. Hypothesis testing (Part 1). Statistical hypotheses and statistical criteria. Characteristics of the criteria. Wald test. Neumann-Pearson lemma. Student's test.
  • Lecture 6. Hypothesis testing (Part 2). Neyman-Pearson lemma. The likelihood ratio criterion. Introduction to A/B testing. Sequential likelihood ratio criterion. Non-parametric criteria.
  • Lecture 7. Nonparametric estimation. Nonparametric density estimation. Losses, risk. Kernel density estimation.
  • Lecture 8. Regression. Standard linear regression. Least squares method.
  • Lecture 9. Design of experiments.
  • Lecture 10. Correlation structure. Multivariate Gaussian models. Inverse to the covariance matrix. Permutation. Block optimization approach. Graphical Lasso.
  • Lecture 11. ICA and matrix decompositions in neuroscience.
  • Lecture 12. Statistical inference.