/MachineLearning

For better online viewing of the material

Creative Commons Zero v1.0 UniversalCC0-1.0

FYS-STK3155/4155 Applied Data Analysis and Machine Learning, http://www.uio.no/studier/emner/matnat/fys/FYS-STK4155/index-eng.html

This site contains all material relevant for the course on Applied Data Analysis and Machine Learning, FYS-STK3155/4155 at the University of Oslo, Norway.

Introduction

Probability theory and statistical methods play a central role in science. Nowadays we are surrounded by huge amounts of data. For example, there are about one trillion web pages; more than one hour of video is uploaded to YouTube every second, amounting to years of content every day; the genomes of 1000s of people, each of which has a length of more than a billion base pairs, have been sequenced by various labs and so on. This deluge of data calls for automated methods of data analysis, which is exactly what machine learning aims at providing.

Learning outcomes

This course aims at giving you insights and knowledge about many of the central algorithms used in Data Analysis and Machine Learning. The course is project based and through various numerical projects, normally three, you will be exposed to fundamental research problems in these fields, with the aim to reproduce state of the art scientific results. Both supervised and unsupervised methods will be covered. You will learn to develop and structure large codes for studying different systems where Machine Learning is applied to, get acquainted with computing facilities and learn to handle large scientific projects. A good scientific and ethical conduct is emphasized throughout the course. More specifically, after this course you will

  • Learn about basic data analysis, statistical analysis, Bayesian statistics, Monte Carlo sampling, data optimization and machine learning;
  • Be capable of extending the acquired knowledge to other systems and cases;
  • Have an understanding of central algorithms used in data analysis and machine learning;
  • Understand linear methods for regression and classification, from ordinary least squares, via Lasso and Ridge to Logistic regression;
  • Learn about various neural networks and deep learning methods for supervised and unsupervised learning;
  • Learn about about decision trees and random forests
  • Learn about support vector machines and kernel transformations
  • Reduction of data sets, from PCA to clustering, supervised and unsupervised methods
  • Work on numerical projects to illustrate the theory. The projects play a central role and you are expected to know modern programming languages like Python or C++.

Prerequisites

Basic knowledge in programming and mathematics, with an emphasis on linear algebra. Knowledge of Python or/and C++ as programming languages is strongly recommended and experience with Jupiter notebook is recommended. Required courses are the equivalents to the University of Oslo mathematics courses MAT1100, MAT1110, MAT1120 and at least one of the corresponding computing and programming courses INF1000/INF1110 or MAT-INF1100/MAT-INF1100L/BIOS1100/KJM-INF1100. Most universities offer nowadays a basic programming course (often compulsory) where Python is the recurring programming language.

The course has two central parts

  1. Statistical analysis and optimization of data
  2. Machine learning

These topics will be scattered thorughout the course and may not necessarily be taught separately. Rather, we will often take an approach (during the lectures and project/exercise sessions where say elements from statistical data analysis are mixed with specific Machine Learning algorithms.

Statistical analysis and optimization of data

The following topics will be covered

  • Basic concepts, expectation values, variance, covariance, correlation functions and errors;
  • Simpler models, binomial distribution, the Poisson distribution, simple and multivariate normal distributions;
  • Central elements of Bayesian statistics and modeling;
  • Gradient methods for data optimization
  • Monte Carlo methods, Markov chains, Metropolis-Hastings algorithm;
  • Linear methods for regression and classification;
  • Estimation of errors using cross-validation, blocking, bootstrapping and jackknife methods;
  • Practical optimization using Singular-value decomposition and least squares for parameterizing data.

Machine learning

The following topics will be covered

  • Linear Regression and Logistic Regression;
  • Neural networks and deep learning;
  • Decisions trees and nearest neighbor algorithms
  • Support vector machines
  • Bayesian Neural Networks
  • Boltzmann Machines
  • Dimensionality reduction, from PCA to cluster models

Hands-one demonstrations, exercises and projects aim at deepining your understanding of these topics.

Computational aspects play a central role and you are expected to work on numerical examples and projects which illustrate the theory and methods. We recommend strongly to form small projects of 2-3 participants.

Practicalities

  1. Four lectures per week, Fall semester, 10 ECTS;
  2. Four hours of laboratory sessions for work on computational projects;
  3. Three projects which are graded and count 1/3 each of the final grade;
  4. A selected number of weekly assignments;
  5. The course is part of the CS Master of Science program, but is open to other bachelor and Master of Science students at the University of Oslo;
  6. Grading scale: Grades are awarded on a scale from A to F, where A is the best grade and F is a fail;
  7. The course is offered as a FYS-MAT4155 (Master of Science level) and a FYS-MAT3155 (senior undergraduate) course;
  8. We use Piazza for course communication, a special link on how to register to Piazza con be found on the official University of Oslo page for the course. Slack is also used for course communication;

Possible textbooks

Recommended textbooks:

  • Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer
  • Aurelien Geron, Hands‑On Machine Learning with Scikit‑Learn and TensorFlow, O'Reilly

General learning book on statistical analysis:

  • Christian Robert and George Casella, Monte Carlo Statistical Methods, Springer
  • Peter Hoff, A first course in Bayesian statistical models, Springer

General Machine Learning Books:

  • Kevin Murphy, Machine Learning: A Probabilistic Perspective, MIT Press
  • Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer
  • David J.C. MacKay, Information Theory, Inference, and Learning Algorithms, Cambridge University Press
  • David Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press

Links to relevant courses at the University of Oslo

The link here https://www.mn.uio.no/english/research/about/centre-focus/innovation/data-science/studies/ gives an excellent overview of courses on Machine learning at UiO.