Segmentation fault in salmon quant
astrdhr opened this issue · 4 comments
Is the bug primarily related to salmon (bulk mode) or alevin (single-cell mode)?
salmon (bulk mode).
Describe the bug
A clear and concise description of what the bug is.
Segmentation fault error when running salmon quant.
To Reproduce
Steps and data to reproduce the behavior:
Inconsistent behaviour, sometimes quant.sf files are generated, sometimes not.
Specifically, please provide at least the following information:
- Which version of salmon was used? 1.10.2 (also occurred using 1.10.1 and 0.14).
- How was salmon installed (compiled, downloaded executable, through bioconda)? installed through bioconda (defined in conda environment).
- Which reference (e.g. transcriptome) was used? BY4742 transcriptome generated using gffread (command used: "gffread -g BY4742.fa -o wt-syn-transcriptome.gff -w wt-syn-transcriptome.fa -v -C BY4742.gff").
- Which read files were used? Paired end fastq files (trimmed by fastp).
- Which which program options were used? "--validateMappings --threads 1 --libType A --index transcriptome-index --mates1 sample1_R1_001.trimmed.fastq.gz --mates2 sample1_R2_001.trimmed.fastq.gz --output sample1"
Expected behavior
A clear and concise description of what you expected to happen.
Salmon quant to generate quant.sf file.
Screenshots
If applicable, add screenshots or terminal output to help explain your problem.
Version Info: ### PLEASE UPGRADE SALMON ###
### A newer version of salmon with bug fixes is available. ####
###
The newest version, available at https://github.com/COMBINE-lab/salmon/releases
contains new features, improvements, and bug fixes; please upgrade at your
earliest convenience.
###
Sign up for the salmon mailing list to hear about new versions, features and updates at:
https://oceangenomics.com/subscribe
###
### salmon (mapping-based) v0.14.1
### [ program ] => salmon
### [ command ] => quant
### [ validateMappings ] => { }
### [ threads ] => { 1 }
### [ libType ] => { A }
### [ index ] => { transcriptome-index }
### [ mates1 ] => { sample1_R1_001.trimmed.fastq.gz }
### [ mates2 ] => { sample1_R2_001.trimmed.fastq.gz }
### [ output ] => { sample1 }
Logs will be written to sample1/logs
[2023-10-11 16:03:44.489] [jointLog] [info] Fragment incompatibility prior below threshold. Incompatible fragments will be ignored.
[2023-10-11 16:03:44.490] [jointLog] [info] Usage of --validateMappings implies use of minScoreFraction. Since not explicitly specified, it is being set to 0.65
[2023-10-11 16:03:44.490] [jointLog] [info] Usage of --validateMappings, without --hardFilter implies use of range factorization. rangeFactorizationBins is being set to 4
[2023-10-11 16:03:44.490] [jointLog] [info] Usage of --validateMappings implies a default consensus slack of 0.2. Setting consensusSlack to 0.2.
[2023-10-11 16:03:44.490] [jointLog] [info] parsing read library format
[2023-10-11 16:03:44.491] [jointLog] [info] There is 1 library.
[2023-10-11 16:03:45.109] [jointLog] [info] Loading Quasi index
[2023-10-11 16:03:45.111] [jointLog] [info] Loading 32-bit quasi index
[2023-10-11 16:03:45.173] [stderrLog] [info] Loading Suffix Array
[2023-10-11 16:03:46.096] [stderrLog] [info] Loading Transcript Info
[2023-10-11 16:03:46.382] [stderrLog] [info] Loading Rank-Select Bit Array
[2023-10-11 16:03:46.474] [stderrLog] [info] There were 6195946 set bits in the bit array
[2023-10-11 16:03:46.481] [stderrLog] [info] Computing transcript lengths
[2023-10-11 16:03:46.481] [stderrLog] [info] Waiting to finish loading hash
[2023-10-11 16:03:56.007] [jointLog] [info] done
[2023-10-11 16:03:56.007] [jointLog] [info] Index contained 3744 targets
[2023-10-11 16:03:56.006] [stderrLog] [info] Done loading index
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
.command.sh: line 9: 64 Segmentation fault salmon quant --validateMappings --threads 1 --libType A --index transcriptome-index --mates1 sample1_R1_001.trimmed.fastq.gz --mates2 sample1_R2_001.trimmed.fastq.gz --output sample1
Desktop (please complete the following information):
- OS: [e.g. Ubuntu Linux, OSX]
Linux - Version [ If you are on OSX, the output of
sw_vers
. If you are on linux the output ofuname -a
andlsb_release -a
]
Linux 0f0e43816679 6.4.16-linuxkit
Additional context
Add any other context about the problem here.
Hi @astrdhr,
Thank you for the bug report. I was wondering if you can run
salmon --version
under the invocation that is failing. I bring this up because your output starts with:
Version Info: ### PLEASE UPGRADE SALMON ###
and this should not happen if you are using the most recent version (there was a segfault related bugfix directly related to what you are seeing in v1.10).
It's possible that if you are running salmon using some sort of script or job submission system, that the version of salmon that is available in your PATH
isn't the same as the most recent one you have installed.
P.S. I'll also note that v0.14 and v1.10 don't have compatible indices, which can also cause a segfault. You should make sure that the index was generated with the version of salmon with which you are attempting to quantify.
--Rob
Hi Rob,
Thanks for your reply. When I run salmon --version I get this:
<jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
<jemalloc>: (This is the expected behaviour if you are running under QEMU)
salmon 1.10.2
I'm running the script using Nextflow in a Docker container. However whether I run the script locally, within Nextflow or on a HPC cluster, it weirdly runs using salmon v0.14.1 (despite me specifying in my environment.yml file to use 1.10.2) and gives the same error. I also installed Salmon through a bioconda channel, not sure if that has any impact.
On your last point - I haven't noticed different versions being used but I'll look out for this.
Hi @astrdhr,
Ok, so the difference between the version you get on the command line, versus the version you get when you actually attempt to run your script to process your data, is certainly a point of concern. In general, the behavior you are seeing during runtime seems like it may be an artifact of not having a compatible index.
Is it possible for you to do a "test run" outside of the Nextflow script? Since you are getting v1.10.2 locally, and this version should work without segmentation fault, that would at least let us narrow the issue down to different versions of salmon being invoked at different stages of the pipeline. At that point, it may be a Nextflow / nf-core issue, but those folks are great and will be able to help in a jiffy!
--Rob
Hi @rob-p,
I think I have figured out the issue. It seems like there's a dependency conflict with the ICU (international components for unicode) package between Salmon and R. It's been mentioned in this issue as well: #594. I cannot have both the newest version of R and Salmon in the same environment.
For context I've been installing Salmon>=1.10.1 through the bioconda channel, and base-r>=4.3.2 through conda-forge. Whenever I have R in the same environment, Salmon defaults to v0.14.1 during use (but the newest version when on the command line). If I remove R, Salmon defaults to the newest version during use and on the command line and works as normal.