/Rtest_and_Rnotes

my R playground and notes

Primary LanguageHTMLGNU General Public License v3.0GPL-3.0

Rtest_and_Rnotes

my R playground and notes

A lot of this is quite old, and most of it is only for myself, not meant for other people, although there are a few files in here I use for teaching.

rhino location: ~/FH_fast_storage/git_more_repos/Rtest_and_Rnotes

Current learning to do list

Tidyverse style guide. I got up to the end of the 'syntax' section. Perhaps see also the Advanced R style guide.

R stuff: for file lists, check out dir():

files <- dir(here("data", "participants"), pattern="*.csv")

for reading multiple files, chech out purrr::map_df:

data <- files %>%
    map_df(~read_csv(file=here("data", "particants", .x)))

downloading files within R:

download.file(url, destfile="data-raw/name-of-file.xlsx")

there's a package called 'readxl' part of tidyverse, but not core tidyverse. readxl::read_excel(). Has a sheet option

furrr R package is like purrr, but parallelized

figure out gheatmap in ggtree

joins using plyranges R package - different types https://www.bioconductor.org/packages/devel/bioc/vignettes/plyranges/inst/doc/an-introduction.html#9_Learning_more

learn about the other R pipe - |>

writing R packages and documentation - https://style.tidyverse.org/documentation.html

R learning - data cleaning: a primer, some tips and a 15 min video

Resources

R primers

Nick Tierney's (mostly) rstats blog

R For The Rest Of Us resources

Advice on making figures

The R graph gallery(https://r-graph-gallery.com) and a list of packages that extend ggplot

Some notes on good coding practices - using Rmarkdown, clean environments, reproducibility, Rprojects

Rmarkdown:

miscellaneous coll-looking R tips from Luke Pemberton, like embedding smaller plots as insets on top of bigger ones, including colors in titles, nice axis formatting, etc, etc

haven't looked yet: future

The Elements of Data Analytic Style by Jeff Leek (a Leanpub book)

Data Science resources list

Things I've learned

R does rounding weirdly!

See here

"The round() function in Base R will round to the nearest whole number and ‘rounding to the even number’ when equidistant, meaning that exactly 12.5 rounds to the integer 12. Note that the janitor package in R contains a function round_half_up() that rounds away from zero. in this case it rounds to the nearest whole number and ‘away from zero’ or ‘rounding up’ when equidistant, meaning that exactly 12.5 rounds to the integer 13.""

Rstudio tricks

"Reindenting your code only shifts things around horizontally. If you want more powerful code reformatting, try using “Code > Reformat Code” (or use ⌘⇧A on macOS or ctrl + shift + A on Windows). It’s a more aggressive form of reformatting that will add extra line breaks and other things to make the code more readable."

Snippets

Example: type fun and press the tab key, and R provides the skeleton of a new function

To see all snippets: Tools - Edit Code Snippets

Debugging

Three options:

  • browser() (place inside a function, temporarily)
  • debug(myFunction) plus undebug(myFunction)
  • debugonce()
    See (explore_debugging_functions.R)[Rscripts/explore_debugging_functions.R] for details.

Miscellaneous

The 'embracing' operator ({{ }}), and unquoting using !! and !!! - see testCode.R for details.

& versus && (and | versus ||): use the short form for bitwise operation on vectors. Use the long form when we want a single TRUE/FALSE answer. any and all functions run OR and AND on all elements of a vector

x <- c(TRUE,TRUE,FALSE,FALSE)
y <- c(TRUE,FALSE,TRUE,FALSE)
x & y
# x && y # this is no good!
any(x)  ## TRUE
all(x)  ## FALSE

The switch function - a multiway if statement, I think?

centre <- function(x, type) {
  switch(type,
         mean = mean(x),
         median = median(x),
         trimmed = mean(x, trim = .1))
}
x <- rcauchy(10)
centre(x, "mean")
centre(x, "median")
centre(x, "trimmed")

In switch, if there's an empty argument, it 'falls-through' to the next thing (e.g. here, myFunc("a") returns the same thing as myFunc("b")). Not also that we can add call. = FALSE to a stop statement to modify the error message that'll be produced

myFunc <- function(x) {
  switch(x, 
    a = ,
    b = 1, 
    c = 2,
    stop("Unknown `x`", call. = FALSE)
  )
}

Useful packages

On a Mac, R packages go here - /Library/Frameworks/R.framework/Versions - in subdirectories by version. After installing new R, can delete old packages to save disk space

combining plots

patchwork package is great

more control over axes and legends

legendry looks useful

importing images and combining with R plots

here's how you'd combine an imported image (import using magick package) with a ggplot:

library(patchwork)
library(ggplot2)
library(magick)

plt1 <- image_read("https://bellard.org/bpg/2.png") %>%
  image_ggplot()
plt2 <- iris %>% 
  ggplot(aes(x=Sepal.Length, y=Sepal.Width)) +
  geom_point()

plt1 | plt2

pretty tables in Rmarkdown (etc)

kable/kableExtra

flextable - see Rscripts/flextable_demo.md

reactable

emphatic - see Rscripts/emphatic_demo.md

tinytable

gt can make multicolumn tables, i.e. can wrap a very long table. That same tutorial shows how to make a multicolumn table, how to include little logos withiin each cell, and how to make a nice-looking two-part footnote.

(and I think some others)

Volcano plots

EnhancedVolcano

fastq files

shortRead package

ngsReports package - ~/public_databases/NCBI/SRA/data/mammalian_expression_profiles/human/human_spermMicrobiomeTotalRNAseq/fastqc/parseFastqc.R

multiple sequence alignments (MSAs)

a list of tools for viewing MSAs

viewing MSA alongside a tree: try ggtree() with option msaplot(). also "ggtreeExtra() has a different way to do it which is probably more flexible"

phylogenetics

ape

ggtree (also tidytree and treeio) (can parse PAML and Hyphy output as well as make some very nice plots). ggtree publication

ggtreeExtra. ggtree can use geom_facet to align associated graphs to the tree but it only works with rectangular, roundrect, ellipse and slanted layouts. ggtreeExtra allows graphs on a tree in rectangular, circular, fan and radial layouts

in ~/domesticated_capsid/Rreports/RTL3_frameshift_plots_v2_aln28.Rmd I got a tree of >5000 mammal species from Upham publication, and extracted the species I want

plotting genes etc:

gggenes - demo shows it only plotting gene arrows, not other data. Also suggests: gggenomes for visualising comparative genomics, plasmapR for quickly drawing plasmid maps from GenBank files

ggbio - not sure whether it is still maintained

ggcoverage

Gviz is what I used for the tetrahymena project, and Michelle's project, and SATAY data. Seems to be maintained and very functional.

GenVisR

for genomes with karyotypes: karyoploteR

rtracklayer can make plots by interacting with a UCSC browser

igvR can interact with IGV

tidyGenomeBrowser

GenomicPlot is more for making metaplots combining data over multiple features

Explored a few options in April 2024 for the SATAY data - see ~/FH_fast_storage/forOtherPeople/forGrantKing/SATAY/janet_Rscripts/ files browser_style_plots_failed_attempts.Rmd and browser_style_plots.Rmd

wordclouds

wordcloud and wordcloud2 packages. see ``/Volumes/malik_h/user/jayoung/presentations/MalikLab/otherSlides_mine/KennedyHighSchoolVisit_2021_Dec7/Hutch_wordCloud.R`

violin plots

use ggplot - geom_violin().

Some other options are vioplot::vioplot(), DescTools::PlotViolin(), easyGgplot2::ggplot2.violinplot(), UsingR::violinplot().

Before the days of ggplot, I noted that I like PlotViolin better than vioplot, but when I run it on large datasets it is very slow if I allow it to use its default bandwidth selection algorithm. If I specify the bw="nrd0" option, it is MUCH quicker.

See also here

other packages

flowchart and ggflowchart packages

ggarrow and arrowheadr packages for nicer looking arrows

gcplyr package for microbial growth curves. Can help read platereader data (with metadata) in and get it in a tidy format. Can model various parameters of growth curves, "like growth rate/doubling time, maximum density (carrying capacity), lag time, area under the curve, diauxic shifts, extinction, and more without fitting an equation for growth to your data."

ggplot themes

https://rfortherestofus.com/2019/08/themes-to-improve-your-ggplot-figures/