/r_tips

A repository of R usage tips for data cleaning, data mining, data visualisation, statistical inference and machine learning

Primary LanguageRCreative Commons Attribution Share Alike 4.0 InternationalCC-BY-SA-4.0

R tips

This repository contains R programming tips covering topics across data cleaning, data visualisation, machine learning, statistical theory and data productionisation.

Many kudos to Dr Chuanxin Liu, my former PhD student and code editor, for teaching me how to code in R in my past life as an immunologist.

Content summary

Legend Category
📚 Data cleaning
🎨 Data visualisation
🔮 Machine learning
🔨 Productionisation
🔢 Statistical theory

Tutorials

🎨 Data visualisation

📚 Data cleaning

🔨 Productionisation

🔮 Machine learning

🔢 Statistical theory

Other resources

The resources below also cover a comprehensive range of practical R tutorials.

Tutorial style guide

A painful form of technical debt is inconsistent code style. This repository now contains the following file naming and code style rules.

  • Folders are no longer ordered with a numerical prefix and names are no longer case sensitive e.e.g r_tips\tutorials\... and r_tips\figures\...
  • Tutorial subtopics share the same prefix e.g. r_tips\tutorials\dv-... and r_tips\tutorials\st-...
  • File names contain - to separate file name prefixes and _ instead of other white space e.g. r_tips\figures\dv-using_diagrammer-simple_flowchart.svg
  • Comments are styled according to the tidyverse style guide:
    • The first comment explains the purpose of the code chunk and is styled differently for enhanced readability e.g. # Code as header --------
    • Comments are written in sentence case and only end with a full stop if they contain at least two sentences
    • Short comments explaining a function argument do not have to be written on a new line
    • Comments should not be followed by a blank line, unless the comment is a stand-alone paragraph containing in-depth rationale or an alternative solution
  • R code chunks are styled as follows:
    • Each R chunk should be named with a short unique description written in the active voice e.g. create basic plot and modify plot labels
    • Arguments inside code chunks should not contain white space and boolean argument options should be written in capitals e.g. {r load libraries, message=FALSE, warning = FALSE}
    • To render the github document, results are generally suppressed using results='hide' and manually entered in a new line beneath the code.
    • To render the github document, figures are generally outputed using fig.show='hold' and figure outputs can then be suppressed at the local chunk level using fig.show='hide'
  • Set a margin of 80 characters length in RStudio through Tools\Global options --> Code --> Display --> Show margin and use this margin as the cut-off for code and comments length

Citations

Citing packages is a good practice when you are publishing research papers. To do this, use citations("package") to print the relevant package publication. A non-exhaustive list of R packages used in this repository is found below.

  • R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
  • Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686
  • H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
  • Matt Dowle and Arun Srinivasan (2021). data.table: Extension of data.frame. R package version 1.14.2. https://CRAN.R-project.org/package=data.table