File exists error mzml input
Closed this issue · 4 comments
Hey,
First of all thanks for the great tool. I was trying to run it on timsTOF data with MSFragger search results. Given that no RAW input is logically available I went with a generated mzML file, however I keep getting a File exists error, please see the log:
2023-09-14 00:21:48,961 - INFO - oktoberfest::main Issued command: run_oktoberfest.py --config_path test.json
2023-09-14 00:21:48,961 - INFO - oktoberfest.utils.config::read Reading configuration from test.json
2023-09-14 00:21:48,967 - INFO - oktoberfest.runner::run_rescoring Starting rescoring run...
2023-09-14 00:21:48,968 - INFO - oktoberfest.utils.config::read Reading configuration from test.json
2023-09-14 00:21:48,993 - INFO - oktoberfest.ce_calibration::_load_search search_type is msfragger
2023-09-14 00:21:48,993 - INFO - oktoberfest.ce_calibration::_gen_internal_search_result_from_msms Converting msms data at T03062_EvoAurEl3_20SPDDDAPASEF-IMP-CMB-1356_1_S1-A1_1_3409.pepXML to internal search result.
100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:38<00:00, 38.85s/it]
2023-09-14 00:22:28,005 - INFO - spectrum_io.search_result.search_results::filter_valid_prosit_sequences #sequences before filtering for valid prosit sequences: 47010
2023-09-14 00:22:28,040 - INFO - spectrum_io.search_result.search_results::filter_valid_prosit_sequences #sequences after filtering for valid prosit sequences: 41835
2023-09-14 00:22:28,692 - INFO - oktoberfest.re_score::split_msms Read 41835 PSMs from out/msms/msms.prosit
2023-09-14 00:22:28,714 - INFO - oktoberfest.re_score::split_msms Creating split search results file out/msms/T03062_EvoAurEl3_20SPDDDAPASEF-IMP-CMB-1356_1_S1-A1_1_3409.rescore
Traceback (most recent call last):
File "/home/pawil/.local/bin/oktoberfest", line 8, in
sys.exit(main())
File "/home/pawil/.local/lib/python3.10/site-packages/oktoberfest/run_oktoberfest.py", line 30, in main
runner.run_job(args.config_path)
File "/home/pawil/.local/lib/python3.10/site-packages/oktoberfest/runner.py", line 211, in run_job
run_rescoring(msms_path, search_dir, config_path, output_path)
File "/home/pawil/.local/lib/python3.10/site-packages/oktoberfest/runner.py", line 171, in run_rescoring
re_score.calculate_features()
File "/home/pawil/.local/lib/python3.10/site-packages/oktoberfest/re_score.py", line 168, in calculate_features
mzml_path.mkdir(exist_ok=True)
File "/usr/lib/python3.10/pathlib.py", line 1175, in mkdir
self._accessor.mkdir(self, mode)
FileExistsError: [Errno 17] File exists: 'T03062_EvoAurEl3_20SPDDDAPASEF-IMP-CMB-1356_1_S1-A1_1_3409_uncalibrated.mzML'
Now I tried this on other data as well with the same result, unsure whether its really a big or a mistake from my side.
Thanks in advance!
Hello @patrick-willems , timsTOF support is not yet added but we are actively working on getting this integrated (#115). Concerning your issue:
I suspect that you provided the path to the mzML directory including the file itself in the "spectra" option, which would explain why you get the file exist error when Oktoberfest is trying to make the directory for the mzML files. Apparently, this is not checked properly.
For now, you could try to provide the folder without the file itself and see if that works. You just need to make sure that "T03062_EvoAurEl3_20SPDDDAPASEF-IMP-CMB-1356_1_S1-A1_1_3409_uncalibrated.mzML" is the only mzML file there since Oktoberfest scans for all mzML files in the provided spectra directory.
Should this also fail, you could copy your file directly into the provided "output"/ mzML/ and Oktoberfest should detect that an mzML file is already in the output folder so it should skip the conversion.
As a last resort, you could tell Oktoberfest that "spectra_type" is "raw" and provide a dummy file with the name "T03062_EvoAurEl3_20SPDDDAPASEF-IMP-CMB-1356_1_S1-A1_1_3409_uncalibrated.raw" in the directory specified with "spectra". As long as the mzml file is already in "out"/ mzML/, Oktoberfest will only see that an mzml file with the same name as the provided dummy raw file exists and think that it has already converted it to mzml, so it will skip conversion and carry on.
In the new API, which I am hopeful to release next week, this is better handled.
Please keep in mind though, that timsTOF is not yet tested properly and that your mzML might actually not be supported.
Thanks for the reply,
Indeed specifying the directory resolved the first hurdle, though it does not want to rescore due to :
"AssertionError: The mass analyzer with accession MS:1000031 is not supported."
Looking forward to the timsTOF implementation, eager to rescore some results and test the performance.
Best
Patrick
Just wanted to mention that I submitted a related issue #129 . I was trying to CE calibrate FragPipe results based on mzML formatted data raw files and got
File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/spectrum_io/raw/msraw.py", line 34, in check_analyzer
raise AssertionError(f"The mass analyzer with accession {accession} is not supported.")
AssertionError: The mass analyzer with accession MS:1000081 is not supported.
"AssertionError: The mass analyzer with accession MS:1000031 is not supported."
I published a hotfix release for spectrum-io (v0.3.3) because it was only there to check if we have default values for the mass tolerance and unit. As long as you supply these yourself, it should be fine. If you install the newest release of oktoberfest (v0.5.0), this error should be gone. The release will be published tonight and the issue will be closed accordingly. Please reopen should you still encounter the problem.