Quality control and differential expression for bcbio RNA-seq experiments.
This is an R package.
Bioconductor method
source("https://bioconductor.org/biocLite.R")
biocLite("devtools")
biocLite("GenomeInfoDbData")
biocLite(
"hbc/bcbioRNASeq",
dependencies = c("Depends", "Imports", "Suggests")
)
conda method
conda install -c bioconda r-bcbiornaseq
Load bcbio run
library(bcbioRNASeq)
bcb <- loadRNASeq(
uploadDir = "bcbio_rnaseq_run/final",
interestingGroups = c("genotype", "treatment"),
organism = "Homo sapiens"
)
# Back up all data inside bcbioRNASeq object
flatFiles <- flatFiles(bcb)
saveData(bcb, flatFiles)
This will return a bcbioRNASeq
object, which is an extension of the Bioconductor RangedSummarizedExperiment container class.
Parameters:
uploadDir
: Path to the bcbio final upload directory.interestingGroups
: Character vector of the column names of interest in the sample metadata, which is stored in thecolData()
accessor slot of thebcbioRNASeq
object. These values should be formatted in camelCase, and can be reassigned in the object after creation (e.g.interestingGroups(bcb) <- c("batch", "age")
). They are used for data visualization in the quality control utility functions.organism
: Organism name. Use the full latin name (e.g. "Homo sapiens").
Consult help("loadRNASeq", "bcbioRNASeq")
for additional documentation.
R Markdown templates
This package provides multiple R Markdown templates, including Quality Control and Differential Expression using DESeq2, which are available in RStudio at File
-> New File
-> R Markdown...
-> From Template
.
View example HTML reports rendered from the default R Markdown templates included in the package:
For a normal bcbio RNA-seq run, the sample metadata will be imported automatically using the project-summary.yaml
file in the final upload directory. If you notice any typos in your metadata after completing the run, these can be corrected in the YAML file. Alternatively, you can pass in a sample metadata file into loadRNASeq()
using the sampleMetadataFile
parameter.
The sample IDs in the bcbioRNASeq object map to the description
column, which gets sanitized internally into a sampleID
column. The sample names provided in the description
column must be unique.
fileName | description | genotype |
---|---|---|
sample_1_R1.fastq.gz | sample_1 | wildtype |
sample_2_R1.fastq.gz | sample_2 | knockout |
sample_3_R1.fastq.gz | sample_3 | wildtype |
sample_4_R1.fastq.gz | sample_4 | knockout |
Use sampleNameAggregate
to assign groupings for technical replicates:
fileName | description | sampleNameAggregate |
---|---|---|
wildtype_L001_R1.fastq.gz | wildtype_L001 | wildtype |
wildtype_L002_R1.fastq.gz | wildtype_L002 | wildtype |
wildtype_L003_R1.fastq.gz | wildtype_L003 | wildtype |
wildtype_L004_R1.fastq.gz | wildtype_L004 | wildtype |
mutant_L001_R1.fastq.gz | mutant_L001 | mutant |
mutant_L002_R1.fastq.gz | mutant_L002 | mutant |
mutant_L003_R1.fastq.gz | mutant_L003 | mutant |
mutant_L004_R1.fastq.gz | mutant_L004 | mutant |
citation("bcbioRNASeq")
Steinbaugh MJ, Pantano L, Kirchner RD, Barrera V, Chapman BA, Piper ME, Mistry M, Khetani RS, Rutherford KD, Hoffman O, Hutchinson JN, Ho Sui SJ. (2017). bcbioRNASeq: R package for bcbio RNA-seq analysis. F1000Research 6:1976.
The papers and software cited in our workflows are available as a shared library on Paperpile.