Overview

If you are interested in contributing your human RNA-seq data sequenced on the Illumina platform to recount2 you will need to modify and submit this form, which has you describe your data and tell us how to access the required files. We'll respond as soon as we can and add your data to recount2. For more information, read the recount2 paper via Nature Biotechnology.

Getting help

If you're confused by any of the instructions below or are having trouble with your submission, feel free to chat with us in the recount2 contributions Gitter or create an issue.

Generating recount2-like files

Before we can add your RNA-seq data to recount2, you'll need to generate files similar to the ones we already provide for each project on SRA. This means you'll have to run Rail-RNA on your dataset against the human reference genome hg38. Rail-RNA will generate a set of deliverables. For recount2, we'll need the cross-sample tables, coverage vectors in bigWig format, and the exon-exon junction files. After the Rail-RNA run is complete, you'll need you to execute this set of R scripts that create the remaining files we need. Once they are generated, you may want to add more phenotype information for your samples. Then you can modify and submit this form, and we'll take it from there. Details foll

Run Rail-RNA

The first step involves running Rail-RNA with your data. You will find more information about how to do this at the Rail-RNA documentation website. You can run it on the cloud using Amazon Web Services Elastic MapReduce. Note that it's crucial that the Rail-RNA deliverables include:

cross-sample tables: tsv
coverage vectors: bw
junction files: jx

and also that reads are aligned to the hg38 assembly. If you perform the alignment locally, please run Rail-RNA using the Bowtie indexes from the hg38 Illumina iGenome. If you perform the alignment using Amazon Web Services Elastic MapReduce, make sure to use the command-line parameter -a hg38. So in local mode, a single command to preprocess and align your RNA-seq data should look like this:

    rail-rna go local -x /path/to/hg38/Bowtie/basename /path/to/hg38/Bowtie2/basename \
    -m /path/to/Rail-RNA/manifest/file -o /path/to/output/dir -d tsv,bw,jx

while in elastic (cloud) mode, the command should look like this:

    rail-rna go elastic -a hg38 -m /path/to/Rail-RNA/manifest/file -o s3://bucket-name/output-dir \
    -d tsv,bw,jx -c <number of core instances>

Create recount2 objects

Once you have the output from Rail-RNA you will need to run the recount-prep R scripts. If you've run Rail-RNA in the cloud, you'll have to download its output to your local system. To run the R scripts, you'll first need to install some dependencies. These are:

as well as the following R/Bioconductor packages that can be installed with the following R command:

install.packages("BiocManager")
BiocManager::install(c('recount', 'devtools', 'getopt', 'downloader',
    'SummarizedExperiment', 'Hmisc'))

Now run prep_setup.R which downloads some files that will be needed in the other scripts. Next, run prep_sample.R for each sample in your data set. You can perform this step in parallel if you like. Finally, run prep_merge.R to create the final recount2 objects. A bash script example that runs all three scripts is available as example_prep.sh. If you choose to model your script after this one, make sure to change the variable definitions made in it as follows.

    DATADIR: (local) path to Rail-RNA output directory
    BWTOOL: path to bwtool v1.0 executable
    WIGGLE: path to wiggletools v1.1 executable
    WIGTOBIGWIG: path to UCSC wigToBigWig executable
    MANIFEST: path to Rail-RNA manifest file that was used in the `rail-rna` command invocation

If you have more metadata (phenotype information for your samples) than the one included by default in the recount2 objects, you can add it to the RangedSummarizedExperiment objects once they are created or modify the preparation R scripts accordingly. For example, adding the tissue information, cell line, age, sex and other demographic variables can be of great use to other researchers.

Submit files

Once you have created all the recount2 objects, please modify and submit this form. In it, we ask you for information on how to contact you, information about your dataset, and instructions on how to access the recount files you created. We will download and check your files. If they're approved, we'll upload them to recount2. The files we'll need access to are:

The bigwig coverage files for each sample created by Rail-RNA: coverage_bigwigs/*.bw
The junction files created by Rail-RNA
The normalized mean coverage file: bw/mean.bw
counts_exon.tsv.gz
counts_gene.tsv.gz
rse_exon.Rdata
rse_gene.Rdata
rse_jx.Rdata
Log files created by the R scripts for reproducibility purposes

The RangedSummarizedExperiment objects contain the sample metadata that we'll use. You should make sure that all three objects have the same metadata.

Summary

Run Rail-RNA on your data against the human hg38 genome reference.
Install dependencies for the recount2 R scripts.
Download the deliverables from Rail-RNA using the same file structure.
Run the prep_setup.R R script.
Run the R script prep_sample.R for each sample.
Run the R script prep_merge.R.
Optionally add more metadata to your phenotype information.
Make the files accessible to us.
Modify and submit this form.

leekgroup/recount-contributions