in silico digestion ignores cleavage setting
Closed this issue · 2 comments
Describe the bug
in silico digestion does not follow specified rule
To Reproduce
{
"type": "SpectralLibraryGeneration",
"tag": "",
"allFeatures": false,
"inputs": {
"library_input": "/Users/tobiasko/Documents/UP000005640_9606.fasta",
"library_input_type": "fasta",
"search_results": "./msms.txt"
},
"fastaDigestOptions": {
"fragmentation": "HCD",
"digestion": "full",
"cleavages": 0,
"minLength": 7,
"maxLength": 30,
"enzyme": "trypsin",
"specialAas": "KR",
"db": "target"
},
"models": {
"intensity": "Prosit_2020_intensity_HCD",
"irt": "Prosit_2019_irt"
},
"output": "/Users/tobiasko/tmp/20230921/",
"outputFormat": "spectronaut",
"prediction_server": "koina.proteomicsdb.org:443",
"ssl": true,
"numThreads": 3,
"fdr_estimation_method": "mokapot",
"regressionMethod": "spline",
"thermoExe": "ThermoRawFileParser.exe",
"massTolerance": 20,
"unitMassTolerance": "ppm"
}
python run_oktoberfest.py --config_path ~/Downloads/SpecLibUP000005640_config.json
2023-09-21 13:02:52,825 - INFO - oktoberfest::main Oktoberfest version 0.4.0
Copyright (c) 2020-2021 Oktoberfest dev-team. All rights reserved.
Written by
- Wassim Gabriel (wassim.gabriel@tum.de),
- Ludwig Lautenbacher (ludwig.lautenbacher@tum.de),
- Matthew The (matthew.the@tum.de),
- Mario Picciani (mario.picciani@in.tum.de),
- Firas Hamood (firas.hamood@tum.de),
- Cecilia Jensen (cecilia.jensen@tum.de)
at the Technical University of Munich.
2023-09-21 13:02:52,826 - INFO - oktoberfest::main Issued command: run_oktoberfest.py --config_path /Users/tobiasko/Downloads/SpecLibUP000005640_config.json
2023-09-21 13:02:52,826 - INFO - oktoberfest.utils.config::read Reading configuration from /Users/tobiasko/Downloads/SpecLibUP000005640_config.json
2023-09-21 13:02:52,827 - INFO - oktoberfest.utils.config::read Reading configuration from /Users/tobiasko/Downloads/SpecLibUP000005640_config.json
Digesting protein 10000
Digesting protein 20000
2023-09-21 13:03:26,300 - INFO - oktoberfest.runner::generate_spectral_lib No of sequences before Filtering is 5970207
2023-09-21 13:03:34,745 - INFO - oktoberfest.runner::generate_spectral_lib No of sequences after Filtering is 5963628
2023-09-21 13:04:06,752 - INFO - oktoberfest.runner::generate_spectral_lib Indices 0, 7000
Inferring predictions for 7000 spectra with batch site 1000: 100%|███████████████████████████████████████████████████████████| 7/7 [00:07<00:00, 1.11s/it]
Inferring predictions for 7000 spectra with batch site 1000: 100%|███████████████████████████████████████████████████████████| 7/7 [00:01<00:00, 6.84it/s]
2023-09-21 13:04
...
Expected behavior
I don't expect to see peptides with internal Ks and Rs, but I get:
head prosit_input.csv
modified_sequence,collision_energy,precursor_charge,fragmentation
LTCTLSSGHSSYAIAWHQQQPEK,30,2,hcd
LTCTLSSGHSSYAIAWHQQQPEK,30,3,hcd
LTCTLSSGHSSYAIAWHQQQPEK,30,4,hcd
LTCTLSSGHSSYAIAWHQQQPEKGPR,30,2,hcd
LTCTLSSGHSSYAIAWHQQQPEKGPR,30,3,hcd
LTCTLSSGHSSYAIAWHQQQPEKGPR,30,4,hcd
LTCTLSSGHSSYAIAWHQQQPEKGPRYLMK,30,2,hcd
LTCTLSSGHSSYAIAWHQQQPEKGPRYLMK,30,3,hcd
LTCTLSSGHSSYAIAWHQQQPEKGPRYLMK,30,4,hcd
GitHub claims the key is called cleavages
https://github.com/wilhelm-lab/oktoberfest
while the docu claims it is called
missedCleavages
https://oktoberfest.readthedocs.io/en/latest/jobs.html#b-spectral-library-generation
There is no error or warning when the config file es read. Maybe it would be a good idea to write the key:value pairs to stdout after parsing the config, so the operators may check if parsing went the expected way.
System [please complete the following information]:
python
Python 3.8.15 | packaged by conda-forge | (default, Nov 22 2022, 08:53:40)
[Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
Not a bug in the code but a change that wasn't properly documented. Please use "missedCleavages" and try again. The problem is that the config file does not find this key and falls back to the default, which is 2. I will correct the documentation in the ReadMe.md and the configuration file specification and the tutorial notebook accordingly.
okidoki!