Hyphenated sample names causes downstream error
Closed this issue · 2 comments
Description of the bug
Ran into an error with the summarized experiment process
Process `NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SE_TRANSCRIPT (all_samples)`
Error message is from R:
Error in findColumnWithAllEntries(ids, metadata) :
No column contains all vector entries
Tracked it down to the parse_metadata
function in the R script.
metadata_id_col <- findColumnWithAllEntries(ids, metadata)
I had used hyphens in my sample names, but the ids passed to findColumnWithAllEntries
have all the hyphens replaced with '.'
eg. "D10-D_Na-R1" becomes "D10.D_Na.R1"
Looks like this is happening with the output from salmon, the column names from the salmon.merged.transcript_counts.tsv
, which are used to set the ids
variable in the Rscript, have the incorrect sample names.
Easy fix to just correct the names in the sample sheet.
But it might be useful to add to another check when initially parsing the sample sheet to catch this right out of the gate.
Command used and terminal output
#!/bin/bash
#SBATCH --job-name=fashe
#SBATCH -p barc
#SBATCH -t 12:00:00
#SBATCH --mem=8G
#SBATCH -o log/rna-%j.out
#SBATCH -e log/rna-%j.err
if [ ! -d log ]; then
mkdir log
fi
module load nextflow
# using the dev branch because of gzip bug that's been fixed
nextflow run nf-core/rnaseq \
-profile unc_longleaf \
-params-file conf/rnaseq_params.yaml \
-r dev
Relevant files
No response
System information
Nextflow 24.04.2
HPC
Slurm
Singularity
Rhel8
nf-core/rnaseq dev branch
The same error comes up when the sample names are numeric Ids. Then R prepends X to the names in the salmon.merged.gene_counts.tsv and this function can not find the samples column.
I believe this is addressed in #1380. Please reopen if the issue persists.