- Docker image: Docker Hub
- Workshop material: pkgdown website
- Video: YouTube
This workshop gives an introductory overview of the DelayedArray framework, which can be used by R / Bioconductor packages to support the analysis of large array-like datasets. A DelayedArray is like an ordinary array in R, but allows for the data to be in-memory, on-disk in a file, or even hosted on a remote server.
Workshop participants will learn where they might encounter a DelayedArray in the wild while using Bioconductor and understand the fundamental concepts underlying the DelayedArray framework. This workshop will feature introductory material, ‘live’ coding, and Q&A, all of which are adapted from the content below.
- Basic knowledge of R syntax.
- Familiarity with common operations on matrices in R, such as
colSums()
andcolMeans()
. - Some familiarity with S4 objects may be helpful but is not required.
Students will be able to run code examples from the workshop material. There will be a Q&A session in the second half of the workshop.
These packages are the focus of this workshop:
Please see the workshop
DESCRIPTION
for a full list of dependencies.
Activity | Time |
---|---|
Introductory material | 8 min |
First contact | 30 min |
Workflow tips for DelayedArray-backed analyses | 5 min |
Q&A | 12 min |
- Learn of existing packages and functions that use the DelayedArray framework.
- Develop a high-level understanding of classes and packages that implement the DelayedArray framework.
- Become familiar with the fundamental concepts of delayed operations, block processing, and realization.
- Reason about potential bottlenecks, and how to avoid or reduce these, in algorithms operating on DelayedArray objects.
- Identify when an object is a DelayedArray or one of its derivatives.
- Be able to recognise when it is useful to use a DelayedArray instead of an ordinary array or other array-like data structure.
- Learn how to load and save a DelayedArray-backed object.
- Learn how the ‘block size’ and ‘chunking’ of the dataset affect performance when operating on DelayedArray objects.
- Take away some miscellaneous tips and tricks I’ve learnt over the years when working with DelayedArray-backed objects.
This workshop uses Bioconductor version 3.12. At the time of writing, this is the current ‘devel’ version of Bioconductor, which can be installed following these instructions.
You can then install the packages necessary for this workshop using the following:
library(BiocManager)
install(c("DelayedArray", "HDF5Array", "ExperimentHub", "DelayedMatrixStats",
"BiocPkgTools", "rmarkdown", "BiocStyle", "rhdf5", "SingleCellExperiment",
"scuttle", "knitr", "DT"))
Alternatively, you can might like to use Docker to runt he workshop in a container with R, all the necessary packages, and RStudio. This can be done as follows:
- Run
docker run -e PASSWORD=delayedarray -p 8787:8787 -d --rm petehaitch/bioc2020_delayedarray_workshop
. Use-v $(pwd):/home/rstudio
argument to map your local directory to the container. - Log in to RStudio at http://localhost:8787 using username
rstudio
and passwordyourpassword
. Note that on Windows you need to provide your localhost IP address likehttp://191.163.92.108:8787/
- find it usingdocker-machine ip default
in Docker’s terminal. - Run
browseVignettes(package = "DelayedArrayWorkshop")
. Click on one of the links, “HTML”, “source”, “R code”.- In case of
The requested page was not found
error, addhelp/
to the URL right after the hostname, e.g., http://localhost:8787/help/library/DelayedArrayWorkshop/doc/Effectively_using_the_DelayedArray_framework_for_users.html. This is a known bug.
- In case of