nf-core/rnaseq

Add `tximport` for RSEM outputs

Opened this issue · 0 comments

Description of feature

Hello,
Hope you are doing well! I am using the output of this pipeline, using --aligner star_rsem, into nf-core/differentialabundance. I prefer the RSEM aligner as I saw odd results in a dataset where some samples were treated with plasmids, and others weren't, and with kallisto/salmon, the samples without plasmids got reads assigned to the plasmid 😱 But I didn't have this issue with RSEM.

Anyway, I'm trying to be a good bioinformatician and use the transcript lengths from #1123. However, I don't see the *.gene_lengths.tsv file necessary for --transcript_length_matrix in differentialabundance, in the output files from the star_rsem folder:

From nf-core/rnaseq documentation

STAR via RSEM

  • Output files
    • star_rsem/
      • rsem.merged.gene_counts.tsv: Matrix of gene-level raw counts across all samples.
      • rsem.merged.gene_tpm.tsv: Matrix of gene-level TPM values across all samples.
      • rsem.merged.transcript_counts.tsv: Matrix of isoform-level raw counts across all samples.
      • rsem.merged.transcript_tpm.tsv: Matrix of isoform-level TPM values across all samples.
      • .genes.results: RSEM gene-level quantification results for each sample.
      • .isoforms.results: RSEM isoform-level quantification results for each sample.
      • .STAR.genome.bam: If -save_align_intermeds is specified the original BAM file containing read alignments to the reference genome will be placed in this directory.
      • .transcript.bam: If -save_align_intermeds is specified the original BAM file containing read alignments to the transcriptome will be placed in this directory.
    • star_rsem/<SAMPLE>.stat/
      • .cnt.model.theta: RSEM counts and statistics for each sample.
      • star_rsem/log/
      • .log: STAR alignment report containing the mapping results summary.

RSEM is a software package for estimating gene and isoform expression levels from RNA-seq data. It has been widely touted as one of the most accurate quantification tools for RNA-seq analysis. RSEM wraps other popular tools to map the reads to the genome (i.e. STAR, Bowtie2, HISAT2; STAR is used in this pipeline) which are then subsequently filtered relative to a transcriptome before quantifying at the gene- and isoform-level. Other advantages of using RSEM are that it performs both the alignment and quantification in a single package and its ability to effectively use ambiguously-mapping reads.

You can choose to align and quantify your data with RSEM by providing the --aligner star_rsem parameter.

I was able to get around this by creating my own gene lengths file (see script here: nf-core/differentialabundance#279 (comment)), but it would be great to incorporate into the main nf-core/rnaseq pipeline for other RSEM users.

Thanks and hope you're having a great day!

Warmest,
Olga