Add `tximport` for RSEM outputs
Opened this issue · 0 comments
Description of feature
Hello,
Hope you are doing well! I am using the output of this pipeline, using --aligner star_rsem
, into nf-core/differentialabundance. I prefer the RSEM aligner as I saw odd results in a dataset where some samples were treated with plasmids, and others weren't, and with kallisto/salmon, the samples without plasmids got reads assigned to the plasmid 😱 But I didn't have this issue with RSEM.
Anyway, I'm trying to be a good bioinformatician and use the transcript lengths from #1123. However, I don't see the *.gene_lengths.tsv
file necessary for --transcript_length_matrix
in differentialabundance
, in the output files from the star_rsem
folder:
From nf-core/rnaseq documentation
STAR via RSEM
- Output files
star_rsem/
rsem.merged.gene_counts.tsv
: Matrix of gene-level raw counts across all samples.rsem.merged.gene_tpm.tsv
: Matrix of gene-level TPM values across all samples.rsem.merged.transcript_counts.tsv
: Matrix of isoform-level raw counts across all samples.rsem.merged.transcript_tpm.tsv
: Matrix of isoform-level TPM values across all samples..genes.results
: RSEM gene-level quantification results for each sample..isoforms.results
: RSEM isoform-level quantification results for each sample..STAR.genome.bam
: If-save_align_intermeds
is specified the original BAM file containing read alignments to the reference genome will be placed in this directory..transcript.bam
: If-save_align_intermeds
is specified the original BAM file containing read alignments to the transcriptome will be placed in this directory.star_rsem/<SAMPLE>.stat/
.cnt
,.model
,.theta
: RSEM counts and statistics for each sample.star_rsem/log/
.log
: STAR alignment report containing the mapping results summary.RSEM is a software package for estimating gene and isoform expression levels from RNA-seq data. It has been widely touted as one of the most accurate quantification tools for RNA-seq analysis. RSEM wraps other popular tools to map the reads to the genome (i.e. STAR, Bowtie2, HISAT2; STAR is used in this pipeline) which are then subsequently filtered relative to a transcriptome before quantifying at the gene- and isoform-level. Other advantages of using RSEM are that it performs both the alignment and quantification in a single package and its ability to effectively use ambiguously-mapping reads.
You can choose to align and quantify your data with RSEM by providing the
--aligner star_rsem
parameter.
I was able to get around this by creating my own gene lengths file (see script here: nf-core/differentialabundance#279 (comment)), but it would be great to incorporate into the main nf-core/rnaseq pipeline for other RSEM users.
Thanks and hope you're having a great day!
Warmest,
Olga