R is a powerful statistic and graphics language and environment used by many individuals and organisations in their day-to-day work with data. In this short training, we explore what doing data-driven journalism (DDJ) is, why R is a great tool for the modern data journalists, and get started learning key features of R for doing data-driven journalism. At the end of the session, participants should:
- Have a definition of what data-driven journalism is
- Know what the process for doing data-driven journalism looks like.
- Learn about the Data Pipeline used by School of Data.
- Set up RStudio and be familiar with some features
- Know some useful R commands for DDJ
- Explore the ggplot2 package
- Know where to go for a deeper dive into R 4 DDJ.
-
Milestone 1
- Define data-driven journalism (DDJ)
- Outline DDJ process
- Outline the Data Pipeline
- Explore why data journalists should care about R.
- References
-
Milestone 2
- Set up RStudio on local devices
- Set up RStudio with a cloud service RollApp
- Explore the RStudio environment and key features.
- Introduce some key R functions for DDJ.
- References
- Milestone 3
- Concept of Grammar of Graphics.
- Introduction to ggplot2.
- Some graphs with ggplot2.
- References
- TO DO: Define data-driven journalism (DDJ)
- TO DO: Outline DDJ process
- TO DO: Explore why data journalists should care about R.
- TO DO: A list of DDJ and R references
sRStudio is a powerful Integrated Development Environment(IDE) that provides a convenient environment to run R-related tasks and projects easily. I will briefly review some of the keys features of RStudio but see this cheatsheet for more details.
-
Setup RStudio on Your Computer
- Go to CRAN website, download your version of R and get it installed on your computer.
- Go to RStudio website, download your version of RStudio IDE and get it installed on your computer.
-
Setup RStudio with RollApp Service
- Visit rollApp website.
- Sign up for an account.
- Ensure you can open and interact with RStudio application through rollApp platform.
- Data Containers & Formats: vector, matrix, array, data frame, list, factors.
- Functions: str, length, dim, names, summary, ls, help/?, read.csv, table, View etc.
At School of Data, we like to think about the data analysis process as a pipeline. Below is a framework we usually use:
- RStudio Visualisation with ggplot2 cheatsheet
- R Project
- Datacamp
- Hadley Wickham: follow on Twitter, @hadleywichkam
- R-bloggers
- Flowing Data Website