my R playground and notes
A lot of this is quite old, and most of it is only for myself, not meant for other people, although there are a few files in here I use for teaching.
rhino location: ~/FH_fast_storage/git_more_repos/Rtest_and_Rnotes
Tidyverse style guide. I got up to the end of the 'syntax' section. Perhaps see also the Advanced R style guide.
R stuff:
for file lists, check out dir()
:
files <- dir(here("data", "participants"), pattern="*.csv")
for reading multiple files, chech out purrr::map_df
:
data <- files %>%
map_df(~read_csv(file=here("data", "particants", .x)))
downloading files within R:
download.file(url, destfile="data-raw/name-of-file.xlsx")
there's a package called 'readxl' part of tidyverse, but not core tidyverse. readxl::read_excel(). Has a sheet option
furrr
R package is like purrr
, but parallelized
figure out gheatmap in ggtree
joins using plyranges R package - different types https://www.bioconductor.org/packages/devel/bioc/vignettes/plyranges/inst/doc/an-introduction.html#9_Learning_more
learn about the other R pipe - |>
writing R packages and documentation - https://style.tidyverse.org/documentation.html
- R packages that help develop and maintain and test packages are {devtools}, {testthat}, {usethis}
- software design principals - rather than my own single bloated package, think about a universe of smaller packages. Functions and packages both benefit from being as modular as possible. https://milesmcbain.xyz/posts/data-analysis-reuse/
- R package writing beginners tips https://www.youtube.com/watch?v=F1GJSn9SqTk
R learning - data cleaning: a primer, some tips and a 15 min video
Nick Tierney's (mostly) rstats blog
R For The Rest Of Us resources
Advice on making figures
The R graph gallery(https://r-graph-gallery.com) and a list of packages that extend ggplot
Some notes on good coding practices - using Rmarkdown, clean environments, reproducibility, Rprojects
Rmarkdown:
- Rstudio's intro to Rmarkdown
- intro2r chapter 8
- detailed Rmarkdown guide
miscellaneous coll-looking R tips from Luke Pemberton, like embedding smaller plots as insets on top of bigger ones, including colors in titles, nice axis formatting, etc, etc
The Elements of Data Analytic Style by Jeff Leek (a Leanpub book)
Data Science resources list
See here
"The round()
function in Base R will round to the nearest whole number and ‘rounding to the even number’ when equidistant, meaning that exactly 12.5 rounds to the integer 12. Note that the janitor package in R contains a function round_half_up()
that rounds away from zero. in this case it rounds to the nearest whole number and ‘away from zero’ or ‘rounding up’ when equidistant, meaning that exactly 12.5 rounds to the integer 13.""
"Reindenting your code only shifts things around horizontally. If you want more powerful code reformatting, try using “Code > Reformat Code” (or use ⌘⇧A on macOS or ctrl + shift + A on Windows). It’s a more aggressive form of reformatting that will add extra line breaks and other things to make the code more readable."
Example: type fun
and press the tab
key, and R provides the skeleton of a new function
To see all snippets: Tools - Edit Code Snippets
Three options:
browser()
(place inside a function, temporarily)debug(myFunction)
plusundebug(myFunction)
debugonce()
See (explore_debugging_functions.R
)[Rscripts/explore_debugging_functions.R] for details.
The 'embracing' operator ({{ }}
), and unquoting using !! and !!! - see testCode.R
for details.
& versus && (and | versus ||): use the short form for bitwise operation on vectors. Use the long form when we want a single TRUE/FALSE answer. any
and all
functions run OR and AND on all elements of a vector
x <- c(TRUE,TRUE,FALSE,FALSE)
y <- c(TRUE,FALSE,TRUE,FALSE)
x & y
# x && y # this is no good!
any(x) ## TRUE
all(x) ## FALSE
The switch
function - a multiway if
statement, I think?
centre <- function(x, type) {
switch(type,
mean = mean(x),
median = median(x),
trimmed = mean(x, trim = .1))
}
x <- rcauchy(10)
centre(x, "mean")
centre(x, "median")
centre(x, "trimmed")
In switch
, if there's an empty argument, it 'falls-through' to the next thing (e.g. here, myFunc("a")
returns the same thing as myFunc("b")
).
Not also that we can add call. = FALSE
to a stop
statement to modify the error message that'll be produced
myFunc <- function(x) {
switch(x,
a = ,
b = 1,
c = 2,
stop("Unknown `x`", call. = FALSE)
)
}
On a Mac, R packages go here - /Library/Frameworks/R.framework/Versions
- in subdirectories by version. After installing new R, can delete old packages to save disk space
patchwork package is great
legendry looks useful
here's how you'd combine an imported image (import using magick package) with a ggplot:
library(patchwork)
library(ggplot2)
library(magick)
plt1 <- image_read("https://bellard.org/bpg/2.png") %>%
image_ggplot()
plt2 <- iris %>%
ggplot(aes(x=Sepal.Length, y=Sepal.Width)) +
geom_point()
plt1 | plt2
kable/kableExtra
flextable
- see Rscripts/flextable_demo.md
emphatic
- see Rscripts/emphatic_demo.md
gt
can make multicolumn tables, i.e. can wrap a very long table. That same tutorial shows how to make a multicolumn table, how to include little logos withiin each cell, and how to make a nice-looking two-part footnote.
(and I think some others)
shortRead package
ngsReports package - ~/public_databases/NCBI/SRA/data/mammalian_expression_profiles/human/human_spermMicrobiomeTotalRNAseq/fastqc/parseFastqc.R
a list of tools for viewing MSAs
viewing MSA alongside a tree: try ggtree() with option msaplot(). also "ggtreeExtra() has a different way to do it which is probably more flexible"
ape
ggtree (also tidytree and treeio) (can parse PAML and Hyphy output as well as make some very nice plots). ggtree publication
ggtreeExtra. ggtree can use geom_facet to align associated graphs to the tree but it only works with rectangular, roundrect, ellipse and slanted layouts. ggtreeExtra allows graphs on a tree in rectangular, circular, fan and radial layouts
in ~/domesticated_capsid/Rreports/RTL3_frameshift_plots_v2_aln28.Rmd I got a tree of >5000 mammal species from Upham publication, and extracted the species I want
gggenes - demo shows it only plotting gene arrows, not other data. Also suggests: gggenomes
for visualising comparative genomics, plasmapR
for quickly drawing plasmid maps from GenBank files
ggbio - not sure whether it is still maintained
Gviz is what I used for the tetrahymena project, and Michelle's project, and SATAY data. Seems to be maintained and very functional.
for genomes with karyotypes: karyoploteR
rtracklayer can make plots by interacting with a UCSC browser
igvR can interact with IGV
GenomicPlot is more for making metaplots combining data over multiple features
Explored a few options in April 2024 for the SATAY data - see ~/FH_fast_storage/forOtherPeople/forGrantKing/SATAY/janet_Rscripts/ files browser_style_plots_failed_attempts.Rmd and browser_style_plots.Rmd
wordcloud
and wordcloud2
packages. see ``/Volumes/malik_h/user/jayoung/presentations/MalikLab/otherSlides_mine/KennedyHighSchoolVisit_2021_Dec7/Hutch_wordCloud.R`
use ggplot - geom_violin()
.
Some other options are vioplot::vioplot(), DescTools::PlotViolin(), easyGgplot2::ggplot2.violinplot(), UsingR::violinplot().
Before the days of ggplot, I noted that I like PlotViolin better than vioplot, but when I run it on large datasets it is very slow if I allow it to use its default bandwidth selection algorithm. If I specify the bw="nrd0" option, it is MUCH quicker.
See also here
flowchart
and ggflowchart
packages
ggarrow
and arrowheadr
packages for nicer looking arrows
gcplyr
package for microbial growth curves. Can help read platereader data (with metadata) in and get it in a tidy format. Can model various parameters of growth curves, "like growth rate/doubling time, maximum density (carrying capacity), lag time, area under the curve, diauxic shifts, extinction, and more without fitting an equation for growth to your data."
https://rfortherestofus.com/2019/08/themes-to-improve-your-ggplot-figures/