/data-science-course

Primary LanguageHTMLMIT LicenseMIT

Data Science

Overview

This course takes you further into the analysis of large, complex, and naturalistic data sets. We introduce, deepen, and explore advanced statistical and computational methods required for this kind of analysis. We further relate these methods to experimental and lab methodology, and to theories of cognitive functions, including social and linguistic cognitive functions. We will start out by taking a look at the core principles of data science. We will then look deeper into the foundations of building and training artificial neural networks. Following that, we will go into the analysis of time series data, including advanced so-called "filtering" approaches.

Materials and Literature

Core Principles

A fantastic resource on the foundations of all of data science is the textbook by Chris Bishop:

Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer-Verlag New York Inc.

This is available as a free PDF.

Artificial Neural Networks (ANNs)

We will use the online textbook by Michael Nielsen:

Nielsen, M. A. (2015). Neural Networks and Deep Learning. Determination Press.

Time Series Analysis

For this part of the course, we will use the online textbook FPP3 by Rob J Hyndman and George Athanasopoulos:

Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: Principles and Practice (3rd edition). OTexts.

Additional Materials

In the later parts of the course, we will also look at scientific papers in the fields of time series analysis and cognitive neuroscience.

Lesson Plan

Course week Week of year Topics and readings
1 6 Introduction, basics of data science (Bishop 1.1, 1.2.{1-3})
2 7 Basics of data science, continued (Bishop 1.2.4, 1.3, 1.4)
3 8 ANNs: foundations (Nielsen 1)
4 9 ANNs: backpropagation and training of networks (Nielsen 2,3)
5 10 ANNs: deep learning (Nielsen 5,6)
6 12 Time series: intro, graphics, decomposition (FPP3 1,2,3)
7 13 Time series: features (FPP3 4)
8 14 Time series: forecaster’s toolbox (FPP3 5)
9 16 Time series: regression models (FPP3 7)
10 17 Time series: exponential smoothing (FPP3 8)
11 18 Time series: ARIMA (FPP3 9)
12 19 Belief updating: cognitive processing of sequential inputs
13 20 Belief updating: hierarchical Gaussian filters

Exam

Format and Deadlines

The format is very simple: you choose a dataset, analyze it using current data scientific methods, and write a paper on the results.

  • The product associated with your paper is the software you produce for the analysis. The whole analysis pipeline has to be submitted and is an integral part of the exam project.

  • Your software may (and is expected to) rely on available tools, i.e., you don’t have to start from scratch

  • Your chosen dataset may be publicly available, newly acquired, or available only to you

  • By 3 April, you decide who (if anybody) you will work with

  • By 24 April, you send me an abstract of your proposed paper (maximum 250 words) for approval

  • By the date specified in the exam plan, you submit your exam project.

  • You submit a GitHub repository containing your text and product via GitHub Classroom. Additionally, you submit the same items via the Digital Exam system.

Formal requirements

As specified in the course description, ordinary examination and re-examination are as follows:

The examination consists of an individual take-home assignment on a topic of the student’s choice and a related practical product.

The scope and nature of the product must be relevant in relation to the content of the course and is subject to the approval of the teacher. It must be possible to submit the product digitally in a documented form which can be accessed by the supervisor and co-examiner. The product must be accompanied by a take-home assignment on a topic of the student’s choice, in which the student explains the relevance and methodological and theoretical basis of the product. Assessment is based on an overall assessment of the take-home assignment and the practical product.

The assignment can be written individually or in groups of up to 3 students. Group assignments must be written in such a way that the contribution of each student, except for the introduction, thesis statement and conclusion, can form the basis of individual assessment. The assignment should clearly state which student is responsible for which section.

  • Length for one student: 10-15 standard pages
  • Length for two students: 15-20 standard pages
  • Length for three students: 20-25 standard pages

The take-home assignment must be handed in for assessment in the Digital Exam system by the date specified in the exam plan.