/lede-algorithms

Algorithms course materials for the Lede program at Columbia Journalism School

Primary LanguageJupyter Notebook

Algorithms - Lede 2018

A course on algorithms for doing journalis.

Course overview

This is a course on algorithmic data analysis in journalism. We will cover basic methods for working with large(ish) data sets, and a variety of techniques used in story production, from regression to simulation to machine learning.

There are basically two different ways algorithms are combined journalism: we can use algorithms to analyze data to produce stories, and as we can do stories about algorithms that affect people's lives. We will do both.

  • Instructor: Jonathan Stray, jms2361@columbia.edu
  • Dates: Mondays and Wednesdays, 7/18-8/29
  • Class: 10am-1pm
  • Location: World Room
  • Lab: 2pm-5pm
  • Slack channel: #algorithms

Schedule

This is a rough outline, and subject to change, but your homework assignments will always be up to date!

Every Monday, you must bring in an algorithmic story to share with the class.

Homework is due before the following class.

Week 1 - Introduction to Algorithms

Algorithms for doing journalism, journalism about algorithms. The purpose of mathematical formalism. csvkit for working with large files. Homework:

  • Use a Jypyter notebook to prove that an average of averages is not the same as the overall average. Similarly for median.
  • Work out when the overall average and an average of averages are equal, and prove formally that this must be so.
  • Show that this really works, by computing the values in Jupyter.
  • Repeat this exercise for the median.

Week 2 - Text Processing

Week 3 - Regression

Week 4 - Machine Learning

Week 5 - Network Anaysis

Week 6 - Simulations 1

Week 7 - Simulations 2