/hamc2019

Hypotheses and Models in Data Intensive Domains

Primary LanguageJupyter Notebook

Hypotheses and Models in Data Intensive Domains Course, @hamc2019, Masters 1/2

Faculty of Computational Mathematics and Cybernetics, Moscow State University

Classes: Wednesday, 18.30 - 20.00, room 606

M. Jordan: ... current focus on doing AI research via the gathering of data, the deployment of “deep learning” infrastructure, and the demonstration of systems that mimic certain narrowly-defined human skills — with little in the way of emerging explanatory principles —  tends to deflect attention from major open problems in classical AI. These problems include the need to bring meaning and reasoning into systems that perform natural language processing, the need to infer and represent causality, the need to develop computationally-tractable representations of uncertainty and the need to develop systems that formulate and pursue long-term goals. These are classical goals in human-imitative AI, but in the current hubbub over the “AI revolution,” it is easy to forget that they are not yet solved.

We have to have error bars around all our predictions ... Otherwise it's gambling, and too many failed predictions can lead to big disappointment with Big Data - a Big Data Winter.

M. Brodie: Yet there is a potential Big Data Winter ahead if people blindly apply Big Data and more specifically machine learning.

Course overview

This is one term course, which provides a survey of the theory and application of methods to work with hypotheses and models in data intensive domains. Topics covered include overview of different approaches to hypotheses and models formulation, representation, tests, logic and probabilistic inference, model quality assessment. This course is part of a sequence of courses on Big Data track and is taught for 1st and 2nd year masters students.

Course outcomes

  • The main objective of this course is to overview hypothesis-driven approach and the skills needed to do empirical research in data-intensive domains
  • The course aims to provide students with techniques and receipts for applying statistical/probabilistic framework to assess quality of models
  • The course will also emphasize recent developments in hypothesis management and will present some open questions and areas of ongoing research

How students time is spent

  • 2 hours per week - lectures
  • 4 hour per week - homeworks

Assessment

  • 40% - Final Oral Exam
  • 30% - Class tests
  • 30% - Homeworks

grade 5: 80 - 100%; 4: 60 - 79%; 3: 40 - 59%; <3: 0 - 39%.

Instructor

Dmitry Kovalev

Assistants

Course Materials

This repository contains lectures and homeworks for @hamc2019. It will be updated as the class progresses.

Week Lecture notes Supplementary materials Homework Tests
1 Introduction
Hypothesis-driven approach
M. Jordan about AI revolution
J. Gray. Fourth Paradigm
M. Brodie. Understanding Data Science
L. Kalinichenko. Methods and Tools for Hypothesis-Driven Research Support
2 Lecture 1 Course at KhanAcademy
Limitations of CLT
CI and hypotheses
test_1
3 Lecture 2 homework_1