bcbioRNASeq

Quality control and differential expression for bcbio RNA-seq experiments.

Installation

This is an R package.

Bioconductor method

source("https://bioconductor.org/biocLite.R")
biocLite("devtools")
biocLite("GenomeInfoDbData")
biocLite(
    "hbc/bcbioRNASeq",
    dependencies = c("Depends", "Imports", "Suggests")
)

conda method

conda install -c bioconda r-bcbiornaseq

Load bcbio run

library(bcbioRNASeq)
bcb <- loadRNASeq(
    uploadDir = "bcbio_rnaseq_run/final",
    interestingGroups = c("genotype", "treatment"),
    organism = "Homo sapiens"
)
# Back up all data inside bcbioRNASeq object
flatFiles <- flatFiles(bcb)
saveData(bcb, flatFiles)

This will return a bcbioRNASeq object, which is an extension of the Bioconductor RangedSummarizedExperiment container class.

Parameters:

uploadDir: Path to the bcbio final upload directory.
interestingGroups: Character vector of the column names of interest in the sample metadata, which is stored in the colData() accessor slot of the bcbioRNASeq object. These values should be formatted in camelCase, and can be reassigned in the object after creation (e.g. interestingGroups(bcb) <- c("batch", "age")). They are used for data visualization in the quality control utility functions.
organism: Organism name. Use the full latin name (e.g. "Homo sapiens").

Consult help("loadRNASeq", "bcbioRNASeq") for additional documentation.

This package provides multiple R Markdown templates, including Quality Control and Differential Expression using DESeq2, which are available in RStudio at File -> New File -> R Markdown... -> From Template.

Examples

View example HTML reports rendered from the default R Markdown templates included in the package:

Sample metadata

For a normal bcbio RNA-seq run, the sample metadata will be imported automatically using the project-summary.yaml file in the final upload directory. If you notice any typos in your metadata after completing the run, these can be corrected in the YAML file. Alternatively, you can pass in a sample metadata file into loadRNASeq() using the sampleMetadataFile parameter.

Minimal example

The sample IDs in the bcbioRNASeq object map to the description column, which gets sanitized internally into a sampleID column. The sample names provided in the description column must be unique.

fileName	description	genotype
sample_1_R1.fastq.gz	sample_1	wildtype
sample_2_R1.fastq.gz	sample_2	knockout
sample_3_R1.fastq.gz	sample_3	wildtype
sample_4_R1.fastq.gz	sample_4	knockout

Technical replicates

Use sampleNameAggregate to assign groupings for technical replicates:

fileName	description	sampleNameAggregate
wildtype_L001_R1.fastq.gz	wildtype_L001	wildtype
wildtype_L002_R1.fastq.gz	wildtype_L002	wildtype
wildtype_L003_R1.fastq.gz	wildtype_L003	wildtype
wildtype_L004_R1.fastq.gz	wildtype_L004	wildtype
mutant_L001_R1.fastq.gz	mutant_L001	mutant
mutant_L002_R1.fastq.gz	mutant_L002	mutant
mutant_L003_R1.fastq.gz	mutant_L003	mutant
mutant_L004_R1.fastq.gz	mutant_L004	mutant

Citation

citation("bcbioRNASeq")

Steinbaugh MJ, Pantano L, Kirchner RD, Barrera V, Chapman BA, Piper ME, Mistry M, Khetani RS, Rutherford KD, Hoffman O, Hutchinson JN, Ho Sui SJ. (2017). bcbioRNASeq: R package for bcbio RNA-seq analysis. F1000Research 6:1976.

References

The papers and software cited in our workflows are available as a shared library on Paperpile.

GeneticResources/bcbioRNASeq