A complete set of sample files to test and try RiboFlow pipeline.
All files and references herein are coming from the human genome.
The files in this repo are random subsets of the originally published data. For each sample, there is 1 million raw reads which is a fraction of the original data.
Not all files are required by RiboFlow.
Required File Types:
- Fastq files from ribosome profiling experiments
- Annotation
- Transcriptome Reference
- Filter
Optional File Types:
- Fastq files from RNA-Seq experiments
- Metadata
- Genome Reference
- Post-Genome Reference
Includes raw reads for ribosome profiling and RNA-Seq data. Each sample has two fastq files. All fastq files are obtained by taking a subset of reads from the publicly available data.
- Single cell ribosome profiling data with UMIs: (1cell-2, 1cell-4)
NCBI GEO accession number GSE185732 published in Ozadam, Tonn, Han, et.al.
This dataset contains UMIs which need to be removed prior to alignment. - Bulk ribosome profiling and RNA-Seq data: ( GSM1606107 and GSM1606108 )
NCBI GEO accession number GSE65778 published in Sidrauski et. al..
Note that RNA-Seq data is optional for RiboFlow and .ribo files.
The tsv file contains transcript lengths. The bed file contains region boundaries; CDS, 5'UTR and 3'UTR.
Contains metadata for the ribo files in yaml format.
Metadata is optional for RiboFlow and ribo files.
Bowtie2 index files for the transcriptome. The actual output of the RiboFlow pipeline, i.e., ribo files, is obtained using the reads that are mapped to the transcriptome reference.
Bowtie2 index coming from the filter sequences which are mainly ribosomal and tRNAs.
A mock Hisat2 reference in place of the entire genome. For actual data analysis, users should download the complete human genome such as hg38. Links are avaialble at Hisat2 website.
Note that this reference has no effect on the output ribo files since the reads are mapped to the transcriptome to generate ribo files. The reads which aren't map to the transcriptome are mapped to genome.
Genomic Reference is an optional parameter for RiboFlow.
A sample bowtie2 reference file as post-genome reference.
Reads that are not mapped to the genome are mapped to post-genome reference.
Similar to the case of genome, post-genome parameter is optional and it has no effect on the output ribo files.