/QMSS-GR5069_Spring2018

Companion repository for QMSS GR5069 - Topics in Applied Data Science for Social Scientists

Primary LanguageR

QMSS G5069 - TOPICS IN APPLIED DATA SCIENCE FOR SOCIAL SCIENTISTS

Marco Morales, Columbia University

This repository is a companion to the course Topics in Applied Data Science for Social Scientists taught at the Quantitative Methods in the Social Sciences program over the Spring of 2018.

It contains references, slides, code, and starter files for data challenges. You can find the most updated version of the course syllabus here as well. Make sure to check it regularly.

Overview

In his now classic Venn diagram, Drew Conway described Data Science as sitting at the intersection between good hacking skills, math and statistics knowledge, and substantive expertise. By training, social scientists possess a fluid combination of all three, but also bring an additional layer to the mix. We have acquired slightly different training, skills and expertise tailored to understand human behavior, and to explain why things happen the way they do. Social scientists are, thus, a particular kind of data scientist.

This course is not intended to teach students how to code, create visualizations, or estimate models. It presumes you have learned that in other classes. This course is intended to take students to the next level in becoming a data scientist. Therefore you will:

  • sharpen your technical skills and better allign them with common business use cases and expectations,
  • learn current best practices in data science that will facilitate collaboration with data scientists trained in engineering or other hard sciences, and
  • learn soft skills that are key to a successful interaction with business stakeholders.

All of these are highly valued skills in the data science job market, but seldom considered as part of an integral training for data scientists.

Prerequisites:

It is assumed that students have basic to intermediate knowledge of R, including experience using it for data manipulation, visualizations, and model estimation. Some mathematics, statistics, econometrics and algebra will also be assumed.

Course Resources

There are no required textbooks for this course, but you might find these to be very useful resources for the course and later in your careers:

To actively participate on this course

By the second session, make sure to have the latest versions of R, RStudio, and Git on your computer. Also, make sure to have registered for a GitHub account.

Accessing course materials

You have two options to access the materials on this repository:

  1. Dynamic: Clone the repository by clicking on the on the "Open in Desktop" button. If you do not have a git client installed on your system, you will need to get one here and also to make sure that git is installed. This is perhaps best, since you can refresh your clone as new content gets pushed.

  2. Static: download the entire repository as a zip file by clicking on the on the "Download ZIP" button. Note that you will have to download it again every time it is updated (and it will be updated at leas weekly during the semester).

You can also subscribe to the repository. This will send you updates each time new changes are pushed to the repository.