A rapidly increasing number of applications in industry, academia, and everyday life are – or should be – based on careful analysis of data. With more and more datasets being easily available, some industries have described themselves as “drowning in data”. This course aims to communicate that anyone and everyone needs to know how to be data-curious, how to access data, and how to analyze data. In this course students will learn to appreciate that with the right tools from statistics and computer science we can learn to take advantage of the growing amounts of data without drowning in it, and that almost any question about the world can be answered using data. They will also learn how to find relevant data sources on the web and to critically evaluate these sources. Furthermore, the course will explore the topics such as reproducibility of data analyses (with the consistent use of literate programming and version control tools throughout the course) as well as data privacy, data sharing, data science ethics, which are becoming increasingly more important in today’s society.
The course highlights tools and techniques from statistics, mathematics, computer science, as well as the social sciences and digital humanities to introduce students to various facets of data analysis such as data visualization, wrangling, and sampling to get a suitable data set; data management to be able to access data quickly and reproducibly; exploratory data analysis to generate hypotheses and intuition; modeling to understand and quantify patterns and prediction; and effective communication of results using visualizations and interpretable summaries.
As part of each assignment, assessment, the semester long project, and case study, students will use data analysis skills to solve problems, and present their process and their results as fully reproducible written reports as well as oral presentations.