/R4DDJ

Outline for School of Data IODC pre-event R workshop for journalists

GNU General Public License v3.0GPL-3.0

R 4 Data-driven Journalism (DDJ)

School of Data Pre-event Training

Facilitator: David Selassie Opoku, School of Data & Open Knowledge (@sdopoku)

R4DDJ

R is a powerful statistic and graphics language and environment used by many individuals and organisations in their day-to-day work with data. In this short training, we explore what doing data-driven journalism (DDJ) is, why R is a great tool for the modern data journalists, and get started learning key features of R for doing data-driven journalism. At the end of the session, participants should:

  • Have a definition of what data-driven journalism is
  • Know what the process for doing data-driven journalism looks like.
  • Learn about the Data Pipeline used by School of Data.
  • Set up RStudio and be familiar with some features
  • Know some useful R commands for DDJ
  • Explore the ggplot2 package
  • Know where to go for a deeper dive into R 4 DDJ.

Outline

Part 1

  • Milestone 1

    • Define data-driven journalism (DDJ)
    • Outline DDJ process
    • Outline the Data Pipeline
    • Explore why data journalists should care about R.
    • References
  • Milestone 2

    • Set up RStudio on local devices
    • Set up RStudio with a cloud service RollApp
    • Explore the RStudio environment and key features.
    • Introduce some key R functions for DDJ.
    • References

Part 2

  • Milestone 3
    • Concept of Grammar of Graphics.
    • Introduction to ggplot2.
    • Some graphs with ggplot2.
    • References

Part 1

Milestone 1: DDJ <> R - a relationship meant to be ?

What is DDJ ?

  • TO DO: Define data-driven journalism (DDJ)

DDJ Process

  • TO DO: Outline DDJ process

Data Pipeline

- **TO DO**: Outline the Data Pipeline

DDJ + R = ?

  • TO DO: Explore why data journalists should care about R.

References

  • TO DO: A list of DDJ and R references

Milestone 2: Kicking off with R in the Studio.

About RStudio

sRStudio is a powerful Integrated Development Environment(IDE) that provides a convenient environment to run R-related tasks and projects easily. I will briefly review some of the keys features of RStudio but see this cheatsheet for more details.

RStudio Setup

  • Setup RStudio on Your Computer

    1. Go to CRAN website, download your version of R and get it installed on your computer.
    • Go to RStudio website, download your version of RStudio IDE and get it installed on your computer.
  • Setup RStudio with RollApp Service

    1. Visit rollApp website.
    • Sign up for an account.
    • Ensure you can open and interact with RStudio application through rollApp platform.

Getting Familiar with RStudio

- Menus - Panes/Windows - Source Editor - Console - Key Actions 1. Create a project - Create a scipt - Help

Some Basic R Commands

  • Data Containers & Formats: vector, matrix, array, data frame, list, factors.
  • Functions: str, length, dim, names, summary, ls, help/?, read.csv, table, View etc.

Part 2: Getting Hands-on

Working Dataset: Ghana Health Facilities Dataset

The Power of R Packages

Building Your Data Pipeline

At School of Data, we like to think about the data analysis process as a pipeline. Below is a framework we usually use:

References & Resources