Marco Morales, Columbia University
This repository is a companion to the course Topics in Applied Data Science for Social Scientists taught at the Quantitative Methods in the Social Sciences program over the Spring of 2018.
It contains references, slides, code, and starter files for data challenges. You can find the most updated version of the course syllabus here as well. Make sure to check it regularly.
In his now classic Venn diagram, Drew Conway described Data Science as sitting at the intersection between good hacking skills, math and statistics knowledge, and substantive expertise. By training, social scientists possess a fluid combination of all three, but also bring an additional layer to the mix. We have acquired slightly different training, skills and expertise tailored to understand human behavior, and to explain why things happen the way they do. Social scientists are, thus, a particular kind of data scientist.
This course is not intended to teach students how to code, create visualizations, or estimate models. It presumes you have learned that in other classes. This course is intended to take students to the next level in becoming a data scientist. Therefore you will:
- sharpen your technical skills and better allign them with common business use cases and expectations,
- learn current best practices in data science that will facilitate collaboration with data scientists trained in engineering or other hard sciences, and
- learn soft skills that are key to a successful interaction with business stakeholders.
All of these are highly valued skills in the data science job market, but seldom considered as part of an integral training for data scientists.
It is assumed that students have basic to intermediate knowledge of R, including experience using it for data manipulation, visualizations, and model estimation. Some mathematics, statistics, econometrics and algebra will also be assumed.
There are no required textbooks for this course, but you might find these to be very useful resources for the course and later in your careers:
- Grolemund, Garrett and Hadley Wickham. 2016. R for Data Science. Boston, MA: O'Reilly Media. Alternatively, you can consult the online version of the text here.
- Wickham, Hadley. 2014. Advanced R. Boca Raton, Fl: Taylor and Francis. Alternatively, you can consult the online version of the text here.
- Chang, Winston. 2013.R Graphics Cookbook. Boston, MA: O'Reilly Media. Alternatively, you can consult the online version of the text here
- Wickham, Hadley. 2016.ggplot2: Elegant Graphics for Data Analysis, Second Ed. New York, NY: Springer. You can get the code from the book here
- Conway, Drew and John Myles White. 2012. Machine Learning for Hackers: Case Studies and Algorithms to Get You Started. Boston, MA: O'Reilly Media.
By the second session, make sure to have the latest versions of R, RStudio, and Git on your computer. Also, make sure to have registered for a GitHub account.
You have two options to access the materials on this repository:
-
Dynamic: Clone the repository by clicking on the on the "Open in Desktop" button. If you do not have a git client installed on your system, you will need to get one here and also to make sure that git is installed. This is perhaps best, since you can refresh your clone as new content gets pushed.
-
Static: download the entire repository as a zip file by clicking on the on the "Download ZIP" button. Note that you will have to download it again every time it is updated (and it will be updated at leas weekly during the semester).
You can also subscribe to the repository. This will send you updates each time new changes are pushed to the repository.