The fbseq
package part of a collection of packages for the fully Bayesian analysis of RNA-sequencing count data, where a hierarchical model is fit with Markov chain Monte Carlo (MCMC). fbseq
is the user interface, and it contains top level functions for calling the MCMC and analyzing output. The other packages, fbseqOpenMP
and fbseqCUDA
, are backend packages that run the MCMC behind the scenes. Only one is required. fbseqOpenMP
can run on most machines, but it is slow for large datasets. fbseqCUDA
requires special hardware (i.e. a CUDA general-purpose graphics processing unit), but it's much faster due to parallel computing.
Read the model vignette first.
An understanding of the underlying hierarchical model is important for understanding how to use this package. For best viewing, download the html file to your desktop and then open it with a browser.
See the "Depends", "Imports", and "Suggests" fields of the "package's DESCRIPTION R version and R package requirements.
Install fbseq
For fbseq
, you can navigate to a list of stable releases on the project's GitHub page. Download the desired tar.gz
bundle, then install it either with install.packages(..., repos = NULL, type="source")
from within R R CMD INSTALL
from the Unix/Linux command line.
For this option, you need the devtools
package, available from CRAN or GitHub. Open R and run
library(devtools)
install_github("wlandau/fbseq")
Open a command line program such as Terminal in Mac/Linux and enter the following commands.
git clone git@github.com:wlandau/fbseq.git
R CMD build fbseq
R CMD INSTALL ...
where ...
is replaced by the name of the tarball produced by R CMD build
.
fbseqOpenMP
and fbseqCUDA
, are backend packages that run the MCMC behind the scenes. Only one is required. fbseqOpenMP
can run on most machines, but it is slow for large datasets. fbseqCUDA
requires special hardware (i.e. a CUDA general-purpose graphics processing unit), but it's much faster due to parallel computing. Installation is similar to that of fbseq
and is detailed in their respective README.md
files and package vignettes.
After installing fbseq
and fbseqOpenMP
, the following should take a couple seconds to run. The example walks through an example data analysis and shows a few key features of the package. For more specific operational details, see the tutorial vignette.
library(fbseq)
back_end = "OpenMP" # change this to "CUDA" to use fbseqCUDA as the backend
# Example RNA-seq dataset wrapped in an S4 class.
data(paschold)
# Use a small subset of the data (few genes).
paschold@counts = head(paschold@counts, 20)
# Run the MCMC for only a few iterations.
configs = Configs(burnin = 10, iterations = 10, thin = 1)
# Set up the MCMC.
chain = Chain(paschold, configs)
# Run 4 dispersed independent Markov chains.
chain_list = fbseq(chain, backend = back_end)
# Get total runtime.
attr(chain_list, "runtime")
# Monitor convergence with in all parameters with
# Gelman-Rubin potential scale reduction factors
head(psrf(chain_list))
# Means, standard deviations, and credible intervals of all parameters
head(estimates(chain_list))
# Means, standard deviations, and credible intervals of
# linear combinations of the "beta" parameters used to calculate
# gene-specific posterior probabilities.
head(contrast_estimates(chain_list))
# Gene-specific (row-specific) posterior probabilities
head(probs(chain_list))
# Monte Carlo samples on a small subset of parameters
m = mcmc_samples(chain_list)
dim(m)
m[1:5, 1:5]
# Set max_iter=Inf below to automatically run until convergence.
max_iter = 3
iter = 1
chain@verbose = as.integer(0) # turn off console messages
chain_list = fbseq(chain, backend = back_end)
# saveRDS(chain_list, "chain_list_so_far.rds") # option to save progress
gelman = psrf(chain_list)
if(any(gelman >= 1.1)) while(iter < max_iter){
chain_list = lapply(chain_list, fbseq, additional_chains = 0, backend = back_end)
gelman = psrf(chain_list)
# saveRDS(chain_list, "chain_list_so_far.rds") # option to save progress
if(all(gelman < 1.1)) break
iter = iter + 1
}
There are multiple options for activating parallel computing.
- Select
backend = "CUDA"
rather thanbackend = "OpenMP
in thefbseq
function. (ThefbseqCUDA
package must be installed.) This option usesCUDA
to massively parallelize each individual MCMC chain ifCUDA
is installed on your machine. - Select
backend = "OpenMP"
andthreads = n
in thefbseq
function, wheren
is greater than 1. (ThefbseqOpenMP
package must be installed.) This option usesOpenMP
to parallelize each individual MCMC chain ifOpenMP
is available on your machine. - Select
backend = "OpenMP"
andprocesses = n
in thefbseq
function, wheren
is greater than 1. (ThefbseqOpenMP
package must be installed.) WhetherOpenMP
is installed or not, this option will distribute some of the MCMC chains overn
parallel processes. Cannot combine withbackend = "CUDA"
orthreads = n
(n > 1
).