/persp-analysis_A18

Perspectives on Computational Analysis (MACS 30000), Autumn 2018

Primary LanguageJupyter Notebook

MACS 30000: Perspectives on Computational Analysis (Autumn 2018)

Dr. Richard Evans Joshua G. Mausolf (TA) Nora Nickels (TA)
Email rwevans@uchicago.edu jmausolf@uchicago.edu nnickels@uchicago.edu
Office 208 McGiffert House 204 McGiffert House 205 McGiffert House
Office Hours Tu 10:30a-12:30p M 1:30p-3:00p W 2:00p-4:00p
GitHub rickecon jmausolf nnickels
  • Meeting day/time: MW 11:30a-1:20p, 247 Saieh Hall for Economics
  • Lab session: W 4:30-5:20p, 247 Saieh Hall for Economics
  • Office hours also available by appointment

Course Description, Objectives, and Outcomes

Computational Social Science (CSS) combines the theoretical paradigms of the social sciences with the expanded data and computational methods of computer science. Massive digital traces of human behavior and ubiquitous computation have both extended and altered classical social science inquiry. This course surveys successful social science applications of computational approaches to the representation of complex data, information visualization, and model construction and estimation. We will examine the scientific method in the social sciences in context of both theory development and testing, exploring how computation and digital data enables new answers to classic investigations, the posing of novel questions, and new ethical challenges and opportunities. Students will review fundamental research designs such as observational studies and experiments, statistical summaries, visualization of data, and how computational opportunities can enhance them. The focus of the course is on exploring the wide range of contemporary approaches to computational social science, with problem sets, programming exercises, and written assignments to gain experience with these methods.

  • You will be introduced to the major research paradigms in computational social science.
  • You will read recent seminal papers in CSS.
  • You will begin to practice implementing CSS methods through assignments.
  • You will write analytical assessments of papers, methods, and approaches.

Required Text

  • [S2018] Salganik, Matthew J., Bit by Bit: Social Research in the Digital Age, Princeton University Press, 2018. free online version
    • You should buy a copy of this book BECAUSE there is a free online version. It will also be a valuable reference in your personal library, and will remain relevant for many years.

Grades

Grades will be based on your performance on nine assignments, each of which is worth 10 points.

  • Homework: I will give you 9 assignments. Some of these will be writing assignments. Some of these will be computational exercises.

    • You must submit your assignments by committing and pushing them to your fork of this GitHub repository on your personal GitHub account in the appropriate folder (e.g., https://github.com/[YourGitHubHandle]/Assignments/A1/[filename].)
    • Assignments will be given on the day listed in the Daily Course Outline section of this syllabus (see below). In general, assignments will be due before class at 11:30am a week after they are assigned. However, exact due dates and times will be listed on the assignment.
  • Plagiarism on writing assignments: Josh and Nora held a Wednesday night lab on what constitutes plagiarism and how to avoid it. Academic honesty is an extremely important principle in academia and at the University of Chicago. See the course Canvas site library reserves for two chapters on plagiarism.

    • Writing assignments must put in quotes and cite any excerpts taken from another work.
    • If the cited work is the particular paper referenced in the Assignment, no works cited or references are necessary at the end of the composition.
    • If the cited work is not the particular paper referenced in the Assignment, you MUST include a works cited or references section at the end of the composition.
    • Any copying of other students' work will result in a zero grade and potential further academic discipline.

Late Problem Sets

Late problem sets will be penalized 1 points for every hour they are late. For example, if an assignment is due on Monday at 11:30am, the following points will be deducted based on the time stamp of the last commit.

Example PR last commit points deducted
11:31am to 12:30pm -1 points
12:31pm to 1:30pm -2 points
1:31pm to 2:30pm -3 points
2:31pm to 3:30pm -4 points
... ...
8:31pm and beyond -10 points (no credit)

Daily Course Schedule

Date Day Topic Readings Homework
Oct. 1 M Introduction to Comp Soc Sci Slides
Oct. 3 W Git and GitHub Notes, Slides A1
CS2014
Oct. 8 M Observational data, large data S2018, Ch. 2
Oct. 10 W Observational data Slides A2
Oct. 15 M Observational data F2015, RW2000
KW2009, A2017
EKLS2015
Oct. 17 W Simulated data Slides A3
Oct. 22 M Simulated data M2002
Oct. 24 W Asking questions S2018, Ch. 3, Slides A4
Oct. 29 M Asking questions CE2015, WRGG2015
S2014, S2016
AH2012, B2014
Oct. 31 W Experiments S2018, Ch. 4, Slides A5
Nov. 5 M Experiments SNCGG2007, AR2014
CK2013, L2006
Nov. 7 W Collaboration S2018, Ch. 5, Slides A6
Nov. 12 M Collaboration W2014, BKV2010
EJQ2016
Nov. 14 W Research collaboration Slides, HJ2018 A7
Nov. 19 M Ethics S2018, Ch. 6, Slides
Nov. 21 W Ethics BF2015, Z2010 A8
Nov. 26 M CSS: Sociology KTE2018, MDSW2017, Slides
Nov. 28 W CSS: Political Science B2018, GST2018, Slides A9
Dec. 3 M CSS: Psychology SMBMYF2018, YSCBGS2014, Slides
Dec. 5 W CSS: Economics A2018, BS2017, Slides

References

  • [A2017] Abrahao, Bruno, Paolo Parigi, Alok Gupta, and Karen S. Cook, "Reputation offsets trust judgments based on social biases among Airbnb users," PNAS, 114:37 (September 12, 2017), pp. 9849-9853.
  • [AR2014] Alcott, Hunt and Todd Rogers, "The Short-run and Long-run Effects of Behavioral Interventions: Experimental Evidence from Energy Conservation," American Economic Review, 104:10 (Oct. 2014), pp. 3,003-3,037.
  • [A1990] Angrist, Joshua D., "Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records," American Economic Review, 80:3 (1990), pp. 313-336.
  • [AH2012] Ansolabehere, Stephen and Eitan Hersh, "Validation: What Big Data Reveal about Survey Misreporting and the Real Electorate," Political Analysis, 20:3, (2012), pp. 437-459.
  • [A2018] Athey, Susan, "The Impact of Machine Learning on Economics," in The Economics of Artificial Intelligence: An Agenda, eds. Ajay K. Agrawal, Joshua Gans, and Avi Goldfarb, National Bureau of Economic Research (forthcoming, 2018).
  • [B2009] Beazley, David M., Python Essential Reference, 4th edition, Addison-Wesley (2009).
  • [BKV2010] Bell, Robert M., Yehuda Koren, and Chris Volinsky, "All Together Now: A Perspective on the Netflix Prize," Chance, 23:1 (2010), pp. 24-29.
  • [B2014] Blumenstock, Joshua (2014), "Calling for Better Measuremenet: Estimating an Individual's Wealth and Well-Being from Mobile Phone Transaction Records," Presented at KDD--Data Science for Social Good 2014, New York.
  • [B2018] Bonica, Adam, "Inferring Roll Call Scores from Campaign Contributions Using Supervised Machine Learning," American Journal of Political Science, (forthcoming, 2018). [link to paper]
  • [BS2017] Brumm, Johannes and Simon Scheidegger, "Using Adaptive Sparse Grids to Solve High-dimensional Dynamic Models," Econometrica, 85:5, pp. 1575-1612 (Sep. 2017)
  • [BF2015] Burnett, Sam and Nick Feamster, "Encore: Lightweight Measurement of Web Censorship with Cross-Origin Requests," in Proceedings of the 2015 ACM Conference on Special Interest Groups on Data Communication, ACM, London (2015), pp. 653-667.
  • [CE2015] Canann, Taylor J. and Richard W. Evans, "Determinants of Short-term Lender Location and Interest Rates," Journal of Financial Services Research, 48:3, (Dec. 2015) pp. 235-262. [link to paper]
  • [CS2014] Chacon, Scott and Ben Straub, Pro Git: Everything You Need to Know about Git, 2nd Edition, Apress, 2014. Free online version
  • [CK2013] Costa, Dora L. and Matthew E. Kahn, "Energy Conservation Nudges and Environmentalist Ideology: Evidence from a Randomized Residential Electricity Field Experiment," Journal of the European Economic Association, 11:3 (2013), pp. 680-702.
  • [DEP2018] DeBacker, Jason and Richard W. Evans and Kerk L. Phillips, "Integrating Microsimulation Models of Tax Policy into a DGE Macroeconomics Framework," Public Finance Review, forthcoming. [link to paper]
  • [EKLS2015] Einav, Liran, Theresa Kuchler, Jonathan Levin, Neel Sundaresan, "Assessing Sale Strategies in Online Markets Using Matched Listings," American Economic Journal: Microeconomics, 7:2 (2015), pp. 215-247.
  • [EJQ2016] Evans, Richard W., Kenneth L. Judd, and Kramer Quist, "Big Data Techniques as a Solution to Theory Problems," in Conquering Big Data with High Performance Computing, ed. Ritu Arora, Springer (2016). [link to paper]
  • [F2015] Farber, Henry S., "Why You Can't Find a Taxi in the Rain and Other Labor Supply Lessons from Cab Drivers," Quarterly Journal of Economics, 130:4 (2015), pp. 1975-2026.
  • [GST2018] Gentzkow, Matthew, Jesse M. Shapiro, and Matt Taddy, "Measuring Group Differences in High-dimensional Choices: Mothod and Application to Congressional Speech," NBER Working Paper #22423 (August 2018).
  • [G2018] Gopalan, Sushmita, "Predicting Infant Mortality: Minimizing False Negatives," unpublished MACSS thesis (2018). [link to paper]
  • [HJ2018] Humpherys, Jeffrey and Tyler J. Jarvis, "Unit Testing," Ch. 7 Labs for Foundations of Applied Mathematics: Python Essentials, Creative Commons, Open Access (2018). [link here]
  • [KW2009] Kossinets, Gueorgi and Duncan J. Watts, "Origins of Homophily in an Evolving Social Network," American Journal of Sociology 115:2, (2009), pp. 405-450.
  • [KTE2018] Kozlowski, Austin C., Matt Taddy, and James A. Evans, "The Geometry of Culture: Analyzing Meaning through Word Embeddings," working paper, Knowledge Lab, University of Chicago, under review (2018).
  • [L2010] Langtangen, Hans Petter, Python Scripting for Computational Science, Texts in Computational Science and Engineering, 3rd edition, Springer (2010).
  • [L2006] List, John A., "Friend or Foe? A Natural Experiment of the Prisoner's Dilemma," Review of Economics and Statistics, 88:3 (August 2006), pp. 463-471.
  • [L2013] Lutz, Mark, Learning Python, 5th edition, O'Reilly Media, Inc. (2013).
  • [MDSW2017] Mao, Andrew, Lili Dworkin, Siddharth Suri, and Duncan J. Watts, "Resilient Cooperators Stabilize Long-run Cooperation in the Finitely Repeated Prisoner’s Dilemma," Nature Communications, p. 13800 (January 2017).
  • [MM2009] Mas, Alexandre and Enrico Moretti, "Peers at Work," American Economic Review, 99:1 (2009), pp. 112-145.
  • [M2018] McKinney, Wes, Python for Data Analysis, 2nd edition, O'Reilly Media, Inc. (2018).
  • [M2002] Moretti, Sabrina, "Computer Simulation in Sociology: What Contribution?" Social Science Computer Review, 20:1 (Spring 2002), pp. 43-57.
  • [RW2000] Rosenzweig, Mark R. and Kennith I. Wolpin, "Natural 'Natural Experiments' in Economics," Journal of Economic Literature, 38:4 (Dec. 2000), pp. 827-874.
  • [SMBMYF2018] Sanchez, Alessandro, Stephan C. Meylan, Mika Braginsky, Kyle E. MacDonald, Daniel Yurovsky, and Michael C. Frank, "childes-db: a Flexible and Reproducible Interface to the Child Language Data Exchange," under review (2018)
  • [SNCGG2007] Schultz, P. Wesley, Jessica M. Nolan, Robert B. Cialdini, Noah J. Goldstein, and Vladas Griskevicius, "The Constructive, Destructive, and Reconstructive Power of Social Norms," Psychological Science, 18:5 (2007), pp. 429-434.
  • [S2014] Sugie, Naomi F., "Finding Work: A Smartphone Study of Job Searching, Social Contacts, and Wellbeing After Prison,"" PhD Thesis, Princeton University (2014). [link here]
  • [S2016] Sugie, Naomi F., "Utilizing Smartphones to Study Disadvantaged and hard-to-Reach Groups," Sociological Methods & Research, January (2016).
  • [WRGG2015] Wang, Wei, David Rothschild, Sharad Goel, and Andrew Gelman, "Forecasting Elections with Non-Representative Polls," International Journal of Forecasting, 31:3 (2015) pp. 980-991.
  • [W2014] Watts, Duncan J., "Common Sense and Sociological Explanations," American Journal of Sociology, 120:2 (Sep. 2014), pp. 313-351.
  • [WWE2018] Wu, Lingfei, Dashun Wang, and James A. Evans, "Large Teams Have Developed Science and Technology; Small Teams Have Disrupted It," working paper, 2018. [link here]
  • [YSCBGS2014] Yourganov, Grigori, Tanya Schmah, Nathan W. Churchill, Marc G. Berman, Cheryl L. Grady, and Stephen C. Strother, "Pattern Classification of fMRI Data: Applications for Analysis of Spatially Distributed Cortical Networks," NeuroImage, 96:1, pp. 117-132 (August 2014).
  • [Z2010] Zimmer, Michael, "But the Data is Already Public: On the Ethics of Research in Facebook," Ethics and Information Technology, 12:4 (2010), pp. 313-325.

Jupyter Notebooks

Jupyter notebooks are files that end with the *.ipynb suffix. These notebooks are opened in a browser environment and are an open source web application that combines instructional text with live executable and modifyable code for many different programming platforms (e.g., Python, R, Julia). Jupyter notebooks are an ideal tool for teaching programming as they provide the code for a user to execute and they also provide the context and explanation for the code. A number of Jupyter notebooks are provided in the OSM Lab boot camp repository Tutorials folder.

These notebooks used to be Python-specific, and were therefore called iPython notebooks (hence the *.ipynb suffix). But Jupyter notebooks now support many programming languages, although the name still pays homage to Python with the vestigal "py" in "Jupyter". The notebooks execute code from the kernel of the specific programming language on your local machine.

Jupyter notebooks capability will be automatically installed with your download of the Anaconda distribution of Python. If you did not download the Anaconda distribution of Python, you can download Jupyter notebooks separately by following the instructions on the Jupyter install page.

Opening a Jupyter notebook

Once Jupyter is installed--whether through Anaconda or through the Jupyter website--you can open a Jupyter notebook by the following steps.

  1. Navigate in your terminal to the folder in which the Jupyter notebook files reside. In the case of the Jupyter notebook tutorials in this repository, you would navigate to the ~/BootCamp2018/Tutorials/ directory.
  2. Type jupyter notebook at the terminal prompt.
  3. A Jupyter notebook session will open in your browser, showing the available *.ipynb files in that directory.
  • In some cases, you might receive a prompt in the terminal telling you to paste a url into your browser.
  1. Double click on the Jupyter notebook you would like to open.

It is worth noting that you can also simply navigate to the URL of the Jupyter notebook file in the GitHub repository on the web (e.g., https://github.com/OpenSourceMacro/BootCamp2018/blob/master/Tutorials/PythonReadIn.ipynb). You can read the Jupyter notebook on GitHub.com, but you cannot execute any of the cells. You can only execute the cells in the Jupyter notebook when you follow the steps above and open the file from a Jupyter notebook session in your browser.

Using an open Jupyter notebook

Once you have opened a Jupyter notebook, you will find the notebook has two main types of cells: Markdown cells and Code cells. Markdown cells have formatted Jupyter notebook markdown text, and serve primarily to present context for the coding cells. A reference for the markdown options in Jupyter notebooks is found in the Jupyter markdown documentation page.

You can edit a Markdown cell in a Jupyter notebook by double clicking on the cell and then making your changes. Make sure the cell-type box in the middle of the top menu bar is set to Markdown. To implement your changes in the Markdown cell, type Shift-Enter.

A Code cell will have a In [ ]: immediately to the left of the cell for input. The code in that cell can be executed by typing Shift-Enter. For a Code cell, the cell-type box in the middle of the top menu bar says Code.

Closing a Jupyter notebook

When you are done with a Jupyter notebook, you first save any changes that you want to remain with the notebook. Then you close the browser windows associated with that Jupyter notebook session. You should then close the local server instance that was opened to run the Jupyter notebook in your terminal window. On a Mac or Windows, this is done by going to your terminal window and typing Cmd-C or Ctrl-C and then selecting y for yes and hitting Enter.

Python tutorials

This course is not a programming course in which you receive in-class instruction on using a programming language such as Python or R. However, you will have assignments that requires some basic use of a programming language. For this reason, I am pointing you to the OSM Lab boot camp repository, which contains six basic Python tutorials in the Tutorials directory.

  1. PythonReadIn.ipynb. This Jupyter notebook provides instruction on basic Python I/O, reading data into Python, and saving data to disk.
  2. PythonNumpyPandas.ipynb. This Jupyter notebook provides instruction on working with data using NumPy as well as Python's powerful data library pandas.
  3. PythonDescribe.ipynb. This Jupyter notebook provides instruction on describing, slicing, and manipulating data in Python.
  4. PythonFuncs.ipynb. This Jupyter notebook provides instruction on working with and writing Python functions.
  5. PythonVisualize.ipynb. This Jupyter notebook provides instruction on creating visualizations in Python.
  6. PythonRootMin.ipynb. This Jupyter notebook provides instruction on implementing univariate and multivariate root finders and unconstrained and constrained minimizers using functions in the scipy.optimize sub-library.

To further one's Python programming skills, a number of other great resources exist.

In addition, a number of excellent textbooks and reference manuals are very helpful and may be available in your local library. Or you may just want to have these in your own library. Lutz (2013) is a giant 1,500-page reference manual that has an expansive collection of materials targeted at beginners. Beazley (2009) is a more concise reference but is targeted at readers with some experience using Python. Despite its focus on a particular set of tools in the Python programming language, McKinney (2018) has a great introductory section that can serve as a good starting tutorial. Further, its focus on Python's data analysis capabilities is truly one of the important features of Python. Rounding out the list is Langtangen (2010). This book's focus on scientists and engineers makes it a unique reference for optimization, wrapping C and Fortran and other scientific computing topics using Python.

Disability services

If you need any special accommodations, please provide Dr. Evans with a copy of your Accommodation Determination Letter (provided to you by the Student Disability Services office) as soon as possible so that you may discuss with me how your accommodations may be implemented in this course.