Columbia University, Lede Program
Tuesdays and Thursdays, May 24th 2016 through July 7th 2016, 10am
Allison Parrish, instructor.
Office hours: By appointment only.
For FERPA reasons, I ask that you e-mail me at my Columbia address when discussing any matters related to this class or your grade. Personal or professional inquiries can go to my personal address.
Consideration of both the scientific and social implications of counting, of turning the world into bits. Through the process of gaining fluency in the use of Python, students will spend some time thinking through representations of core "data types" like time, location, text, image, sound and relationships (or networks), and the computational "affordances" associated with each. Students will study several common metaphors for organizing and storing data – from structureless key-value stores, to a single table or spreadsheet, to the "multiple tables" of a relational database. We will also discuss ideas behind publishing or sharing data, moving from HTML documents and Web 1.0 to data services and APIs in Web 2.0, to semantics in Web 3.0. Student work and discussion will underscore the reality that data are plentiful and circulate and interact in a kind of informational ecosystem. As researchers, our students will be called on both to access and to publish data products.
Notes for previous versions of the course:
There will be six homework assignments in this class, each assigned on Thursday and due the following Tuesday before the beginning of class. The homework assignments are designed to test and expand your knowledge of the technical concepts introduced in class. Each homework assignment is worth 10% of your grade.
With the exception of the first assignment, all homeworks will take the form of an IPython Notebook that you fill in and send to a TA for grading. (We'll discuss the specifics of this in class.)
- 40% Attendance and participation
- 60% Homework assignments (10% each)
- Orientation
- Student introductions
- SQL basics
Homework #1 (due May 31): Read and respond to the following.
- Relational and Non-Relational Models in the Entextualization of Bureaucracy by Michael Castelle
- Literature is not Data: Against the Digital Humanities by Stephen Marche
- Machine Bias by Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner
These essays each address the limits and consequences of data-driven analysis and public policy. Your response should take the form of a brief e-mail (no more than 3-5 paragraphs) sent to me. In your response, describe the critique of one or more of the essays and discuss how (if at all) you might incorporate their critique(s) into your practice as a journalist. Also in your e-mail, include and comment on a link to an essay or article that you feel "speaks to" the points raised in one or more of the essays (e.g., agrees with, provides a counterexample, expands upon, responds to).
- SQL continued
- IPython/Jupyter Notebooks. Basics, Running Code, Markdown tutorial
- Installing Python Libraries (other notes TK)
- Using SQL in Python
- SQL and CSVs
To install Jupyter Notebook on OSX:
sudo pip3 install jupyter
Depending on how you've installed Python on Windows, try:
pip3 install jupyter
py -3 -m pip install jupyter
Homework #2 (due June 7): Working with SQL.
Homework #3 (due Jun 14): Web scraping.
- Working with unstructured data
- List comprehensions (scroll down, needs to be translated to Python 3)
- Strings and regular expressions (note: needs to be translated to Python 3!)
Homework #4 (due June 21): List comprehensions and regular expressions
Homework #5 (due June 28): SQL schema design
- Making a Flask app (the template files referenced
can be found in the
templates
folder of this repository)
Homework #6: Web applications.
- The Twitter API (see completed
lake_bot.py
in this repo) - Homework review
- If we have time: Intro to NLP with TextBlob
Extra credit homework assignment: Create and deploy a Twitter bot. The bot should either respond to updates on an external data source (like NYT 4th Down Bot or Congress Edits) or iterate through/randomly select data for presentation (like Census Americans). This extra credit assignment can make up for up to 5% of your final grade. Complete this assignment by July 12th. Send me a link to the Twitter bot and a zip file with the source code for the bot.