A simple python script to fix scan numbers in result files generated by Proteome Discoverer. Proteome Discoverer doesn't read scan numbers from MGF
files correctly and just numbers spectra from 0
to n
. This script remaps the scan numbers given by Proteome Discoverer back to the original
scan numbers as given in the MGF file.
By default, this script will fix result files of MS Annika but can be adjusted at will.
"""
DESCRIPTION:
A simple script to fix wrongly parsed scan numbers generated from Proteome
Discoverer. Takes one table (e.g. PSMs, CSMs, etc) as input in .xlsx or .csv
format plus the corresponding mgf spectra containing the spectra. Additionally
the column name storing the (wrong) scan numbers in the table has to be given.
By default the column name "First Scan" is used (used in MS Annika). For
non-standard mgf files a regex pattern for parsing the scan number from the
title can be supplied.
USAGE:
scan_nr_repair_tool.py [-d --data]
[-m --mgf]
[-c --colname]
[-p --pattern]
[-o --output]
required arguments:
-d str, --data str
Input data file to be fixed in .csv or .xlsx format.
-m str, --mgf str
Input spectra file in mgf format.
optional arguments:
-c str, --colname str
The column name of the column that holds the scan numbers in the input data file.
Default: "First Scan"
-p str, --pattern str
Regex pattern to be used to get the scan number from the title if it can't be automatically infered.
Default: "\\.\\d+\\."
-o str, --output str
Name of the output file.
Default: Name of the input data file + "_fixed.xlsx"
-h, --help
Show this help message and exit.
--version
Show program's version number and exit.
"""