nf-core/rnaseq

Hyphenated sample names causes downstream error

Closed this issue · 2 comments

Description of the bug

Ran into an error with the summarized experiment process

Process `NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SE_TRANSCRIPT (all_samples)`

Error message is from R:

Error in findColumnWithAllEntries(ids, metadata) : 
No column contains all vector entries

Tracked it down to the parse_metadata function in the R script.

metadata_id_col <- findColumnWithAllEntries(ids, metadata)

I had used hyphens in my sample names, but the ids passed to findColumnWithAllEntries have all the hyphens replaced with '.'
eg. "D10-D_Na-R1" becomes "D10.D_Na.R1"

Looks like this is happening with the output from salmon, the column names from the salmon.merged.transcript_counts.tsv, which are used to set the ids variable in the Rscript, have the incorrect sample names.

Easy fix to just correct the names in the sample sheet.

But it might be useful to add to another check when initially parsing the sample sheet to catch this right out of the gate.

Command used and terminal output

#!/bin/bash
#SBATCH --job-name=fashe
#SBATCH -p barc
#SBATCH -t 12:00:00
#SBATCH --mem=8G
#SBATCH -o log/rna-%j.out
#SBATCH -e log/rna-%j.err

if [ ! -d log ]; then
    mkdir log
fi

module load nextflow

# using the dev branch because of gzip bug that's been fixed
nextflow run nf-core/rnaseq \
    -profile unc_longleaf \
    -params-file conf/rnaseq_params.yaml \
    -r dev

Relevant files

No response

System information

Nextflow 24.04.2
HPC
Slurm
Singularity
Rhel8
nf-core/rnaseq dev branch

idot commented

The same error comes up when the sample names are numeric Ids. Then R prepends X to the names in the salmon.merged.gene_counts.tsv and this function can not find the samples column.

I believe this is addressed in #1380. Please reopen if the issue persists.