/recountWorkshop2020

BioC2020 workshop: Human RNA-seq data from recount2 and related packages

Primary LanguageHTML

Human RNA-seq data from recount2 and related packages

Lifecycle: stable BioC status R build status

Instructor

Leonardo Collado-Torres

Workshop Description

The recount2 project re-processed RNA sequencing (RNA-seq) data on over 70,000 human RNA-seq samples spanning a variety of tissues, cell types and disease conditions. Researchers can easily access these data via the recount Bioconductor package, and can quickly import gene, exon, exon-exon junction and base-pair coverage data for uniformly processed data from the SRA, GTEx and TCGA projects in R for analysis. This workshop will include an overview of the recount2 project as well as methods and tools that have been to improve it since 2017. The overview will be followed by a hands-on session where you will dive into recount and related packages along the lines of the recount workflow. We will also cover new related packages such as GenomicState.

In more detail, this workshop will cover different use cases of the recount package, including downloading and normalizing data, processing and cleaning relevant phenotype data (including recount-brain), performing differential expression (DE) analyses using other Bioconductor packages. The workshop will also cover how to use the base-pair coverage data for annotation-agnostic DE analyses and for visualizing coverage data for features of interest. After taking this workshop, attendees will be ready to enhance their analyses by leveraging RNA-seq data from 70,000 human samples.

recount2 website, recount package, paper, recount workflow.

Prior editions: Bioc2017, BioC2019.

Pre-requisites

  • Basic knowledge of R syntax
  • Basic knowledge of RNA-seq or gene expression data
  • Familiarity with SummarizedExperiment vignette
  • Familiarity with the RangedSummarizedExperiment class

Workshop Participation

Students are expected to ask questions during the recount2 overview presentation and to bring their own laptop so they can follow the hands-on portion of the workshop.

The slides for the workshop are available through speakerdeck.

<script async class="speakerdeck-embed" data-id="78092f5c558242f1aa3085d9abc9c061" data-ratio="1.77777777777778" src="//speakerdeck.com/assets/embed.js"></script>

You can download a Docker image with all the workshop files using:

docker run -e PASSWORD=bioc2020 -p 8787:8787 -d --rm lcollado/recountworkshop2020

Then, log in to RStudio at http://localhost:8787 using username rstudio and password bioc2020. Note that on Windows you need to provide your localhost IP address like http://191.163.92.108:8787/ - find it using docker-machine ip default in Docker’s terminal.

To see the vignette on RStudio’s window (from the docker image), run browseVignettes(package = "recountWorkshop2020"). Click on one of the links, “HTML”, “source”, “R code”. In case of The requested page was not found error, add help/ to the URL right after the hostname, e.g., http://localhost:8787/help/library/recountWorkshop2020/doc/recount-workshop.html. This is a known bug.

Bioconductor packages used

SummarizedExperiment, recount, GenomicRanges, DESeq2, derfinder, regionReport are among the packages that will be explicitly covered.

Time outline

Activity Time
recount2 project overview 10m
accessing gene count data with recount 15m
Gene-level analysis with recount2 and recount-brain 15m
Un-annotated transcriptome methods overview: derfinder, GenomicState 5 min
the future: recount3 5m

Total: a 50-55 minute session.

Workshop goals and objectives

Learning goals

  • Understand the publicly data available via the recount2 project beyond gene counts
  • Identify how recount2 data could be useful for your RNA-seq projects and how you can use it
  • Describe the annotation-agnostic methods powered by recount2 that complement other RNA-seq analysis pipelines

Learning objectives

  • Re-analyze public RNA-seq data using the recount Bioconductor package
  • Recognize how to access the different types of data hosted by recount2
  • Locate resources that enhance recount2 such as recount-brain, derfinder, and GenomicState

Installation instructions

Get the latest stable R release from CRAN. Then install recountWorkshop2020 using from GitHub with the following code:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("LieberInstitute/recountWorkshop2020")

Citation

Below is the citation output from using citation('recount in R. Please run this yourself to check for any updates on how to cite recount.

print(citation("recount")[2], bibtex = TRUE)
#> 
#> Collado-Torres L, Nellore A, Jaffe AE (2017). "recount workflow:
#> Accessing over 70,000 human RNA-seq samples with Bioconductor [version
#> 1; referees: 1 approved, 2 approved with reservations]."
#> _F1000Research_. doi: 10.12688/f1000research.12223.1 (URL:
#> https://doi.org/10.12688/f1000research.12223.1), <URL:
#> https://f1000research.com/articles/6-1558/v1>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {recount workflow: Accessing over 70,000 human RNA-seq samples with Bioconductor [version 1; referees: 1 approved, 2 approved with reservations]},
#>     author = {Leonardo Collado-Torres and Abhinav Nellore and Andrew E. Jaffe},
#>     year = {2017},
#>     journal = {F1000Research},
#>     doi = {10.12688/f1000research.12223.1},
#>     url = {https://f1000research.com/articles/6-1558/v1},
#>   }

Please note that the recount was only made possible thanks to many other R and bioinformatics software authors, which are cited either in the vignettes and/or the paper(s) describing this package.

Code of Conduct

Please note that the recountWorkshop2020 project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Development tools

For more details, check the dev directory.

This package was developed using biocthis.