Data Science and Statistics

1. The data state of mind:

  • Class intro
  • Thinking in rows and columns (with a segue into unstructured text)
  • Basic commands for importing, manipulating and working with data in R

2. Trends and outliers

Exploring your data to find the two things that journalists care about most: trends and outliers.

  • Summary and descriptive statistics
  • Exploring data in one, two and N dimensions
  • Exploratory data visualization

3. Interviewing the data

Now that you have an understanding of the data, you can start to formulate some questions. This section will talk about how to ask those questions effectively, and teach you how to evaluate the answer.

  • Slicing, subsetting and otherwise querying data in R
  • Comparing apples to apples (or apples and oranges): looking at variable units, z-scores, etc.
  • Understanding statistical significance

4. Building model

Analysts typically use models for two things: to predict or explain. We'll go over use cases for both and talk about two simple modeling techniques -- linear and logistic regression.

  • Linear regression. Predicting vs. explaining.
  • Ditto for logistic regression

5. Bringing it home to the newsroom:

There aren't a whole lot of reporters and editors who got their jobs because they're good at math. As such, a big part of using data successfully in the newsroom is being able to communicate its value, caveats, etc. to less data-savvy colleagues. We'll talk about strategies for bulletproofing stories and explaining conclusions.

  • Bulletproofing: duplicating the analysis, testing, sanity checking
  • Explaining results: understanding and explaining uncertainty, using data for finding examples vs. identifying trends, communicating these things in layman's terms
  • Communicating to readers: nerd boxes, data visualization, keeping the data out of the narrative.