/r-guides-and-galleries

A short list of core R guides and galleries. Following these can lead you from zero experience to mastering the key skills of data analysis and presentation with R

MIT LicenseMIT

R guides & galleries for data analysis & presentation

These guides and galleries can help you master the key skills of data analysis and presentation with R. I’ve put them together to help anyone totally new to R (like I was not long ago). While I know a lot more now, I still come back to all theses guides and galleries to learn more.

Learn to analyse data with R

1️⃣ Learn data analysis in R from scratch:

2️⃣ Totally new to code or R? Scared you will fail and feel stupid?:

  • If you’ve never coded before, or you have struggled to improve your R skills, consider paying for gentle guided learning from DataCamp. Their Data Analyst with R skill track takes you through 16 courses in a logical order. They assume no prior knowledge or skills.

  • A free alternative but for a more limited range of fundamental skills are the interactive RStudio Primers.

3️⃣ Don’t train too much. Do real analysis early:

  • It can be tempting to complete many courses before you feel ready to try your own analysis. Instead, try writing your own simple analysis in R code as soon as you can, even if you don’t feel ready.

  • A good first project is replacing an Excel data task you know well with R code. Doing real things early is David Robinson’s philosophy he explains in the section titled Get students doing powerful things early.

  • Ideally, find a mentor to help you code your first projects. It’s a quicker way to learn.

4️⃣ Become an independent problem solver:

  • Don’t hesitate to ask for help or to Google your problem or question. Getting better at solving your own problems using Google and other resources is an important skill to learn too, even when you have become an expert R programmer.

  • This Getting Help in R blog tells you about the best places to find help. The more you look for help, the better you become at judging which web sites and people help you the most.

5️⃣ Write code in this clean style:

6️⃣ Repeating the same analysis regularly? Consider converting to an R package:

7️⃣ Use GitHub to save your code:

  • Start saving your code on GitHub even when if you first start to learning R. This is called versioning your code.

  • However, GitHub can be hard to learn if you only use the text commands typed at the command prompt. An easier way to learn GitHub with RStudio is through the menus and buttons as explained in Using git from RStudio tutorial.

  • GitHub can also be fiddly to get working with RStudio. This is why Jenny Bryan has written Happy Git and GitHub for the useR to help her students troubleshoot the many potential problems. Because of these challenges with GitHub, ideally find an experienced user of RStudio and GitHub to coach you in person.

Learn to visualise data with R

8️⃣ Explore your data first (EDA): Some R tools let you do Exploratory Data Analysis (EDA) interactively with a Point-and-Click interface. This can be faster and easier than coding alone, particularly when you first start to learn R. For example:

  • esquisse is an RStudio “Addin” that launches the Point-and-Click interface shown below. Use it to build simple ggplot plots. The tool also automatically generates the code you need for each plot that you create with it. Even advanced R users can find ggplot code tricky so esquisse can be useful for everyone. Here is a long list of many more RStudio Add-ins to try.

  • rpivotTable is the R version of Excel’s Pivot Tables and Charts. It’s great for quickly exploring your data with heat maps, bar charts, line charts etc.. in an R Markdown html output document (R Markdown is described later).

  • Part-automate your EDA: The blog post Explore the landscape of R packages for automated data exploration, its detailed paper, and associated autoEDA-resources GitHub repository describe many automatic tools you can try. You can view several examples of the automatic EDA reports these tools can create here. For example, here is a DataExplorer automatic report and this an example from SmartEDA. Both are generated from just one line of code. DataExplorer and smartEDA automatic reports work best when you first: keep only relevant columns, choose the correct data type for each column (this could be done as you read in the data), and you set one column as the response or outcome.

9️⃣ Make Exploratory Data Analysis easy for yourself:

  • In his chapter Data exploration versus data presentation Claus Wilke suggests using any tool that makes data exploration quick and easy for you.

  • As your R skills improve you will gradually use R code more to explore your data. For example, Roger Peng’s Exploratory Data Analysis with R clearly explains how to explore data using R code that is both easy to understand and run for yourself.

1️⃣0️⃣ Pick the right visualistion to tell your data story:

  • Use this Data to Viz website to help you pick the right visualisation for the type of data.

  • Also, scan the left hand contents panel of the Fundamentals of Data Visualisation guide by Claus Wilke to find the right visualisation for the point you are trying to make. The charts in his book are written in R code that you can find in its GitHub repository.

  • However, Wilke didn’t write the book to teach R code skills. It is tool and language neutral. Instead, learn how to create ggplot visualistions in R code with the R Graphics Cookbook from the broader Cookbook for R.

1️⃣1️⃣ Learn to build ggplots easily step-by-step:

1️⃣2️⃣ Make your plots interactive:

  • From the Gallery of interactive R visualisations I strongly recommend Plotly for a huge variety of easy to code interactive charts. And for easy to code time series charts dygraphs is visually exciting.

  • Creating interactive plots also improves your data exploration. Carson Sievert who maintains the Plotly R package demonstrates here how if interactive plots are built quickly (as is possible with Plotly) they can augment your data exploration.

1️⃣3️⃣ Does a ggplot extension improve your story?: With this gallery of ggplot extensions create a greater variety of ggplots that can improve the story you tell in data. The most popular extension is gganimate described next.

1️⃣4️⃣ Consider animating your plots:

  • If movement of data points better explains the story you are telling, animate them with gganimate.

  • Learn how to animate ggplots intuitively using my guide inspired by a good Tweet.

Putting it all together

1️⃣5️⃣ Be creative - but don’t break these rules! It is surprisingly easy to build bad or ugly visualisations and not realise how it could be better.

1️⃣6️⃣ Present your final story:

  • Tell an engaging data story in your final document. Claus Wilke’s chapter on telling a story and making a point shows you how to tell stories with engaging visuals that won’t confuse your audience.

  • On the Data Science competition website called Kaggle, browse the Kaggle “kernels” in R with the most votes.There are many ideas here for powerful story telling in data.

1️⃣7️⃣ Present your story with R Markdown:

  • One of the most effective formats for story telling in data is R Markdown, particularly if you create interactive html documents.

  • Here is a Quick Tour. Browse through the examples of others in RPubs. Once you are familiar with how to build basic R Markdown outputs in html consider the many ways you can make it more engaging with the options described in R Markdown: The Definitive Guide. For example, an effective way to organise long data stories are to break them up into tabbed sections.

1️⃣8️⃣ Watch a master story teller use R:

  • Even with only basic R skills you can learn a lot from watching an R expert rapidly explore and visualise data to tell a story. David Robinson records himself carrying out live data analysis in R using data he has never seen before. In this recording he explores wind turbine locations in the USA. It is one of the many TidyTuesday data sets he has coded live.

  • Here is an RStudio competition winning Shiny app that can show the most liked tweets for each data set: tidytuesday.rocks

1️⃣9️⃣ Telling good stories is hard because of the curse of knowledge:

  • Aim to tell your data story to someone who doesn’t know what you know about the data. But, it is surprisingly hard to remember what it was like to not know what you now know, particularly after spending so long exploring the data set. This amnesia about your prior limited knowledge is called the curse of knowledge.

  • To defeat this curse, assume as little knowledge as possible in your final document. The more prior knowledge you assume, the more likely it is your text, code and visualisations lose your audience. Or even worse, you accidentally mislead them.

2️⃣0️⃣ Explain clearly. Democratise:

  • A clear plain style in your writing, code and visualisations aimed at the widest audience possible doesn’t have to be dumbed down or over simplified. You can still present technical topics. For example, here I have tried to explain key Natural Language Processing techniques in R while assuming no prior knowledge.

  • Don’t be a gatekeeper of your growing R knowledge and skills. Share, explain and democratise what you know. You can then move on to more complex analysis in R with an even higher value (as proposed by Richard Susskind in The Future of Professions).