/Proteome_Discoverer_MGF_Scan_Number_Repair_Tool

A simple python script to fix scan numbers in result files generated by Proteome Discoverer.

Primary LanguagePythonMIT LicenseMIT

workflow_state

Proteome Discoverer MGF Scan Number Repair Tool

A simple python script to fix scan numbers in result files generated by Proteome Discoverer. Proteome Discoverer doesn't read scan numbers from MGF files correctly and just numbers spectra from 0 to n. This script remaps the scan numbers given by Proteome Discoverer back to the original scan numbers as given in the MGF file.

By default, this script will fix result files of MS Annika but can be adjusted at will.

Usage

"""
DESCRIPTION:
A simple script to fix wrongly parsed scan numbers generated from Proteome
Discoverer. Takes one table (e.g. PSMs, CSMs, etc) as input in .xlsx or .csv
format plus the corresponding mgf spectra containing the spectra. Additionally
the column name storing the (wrong) scan numbers in the table has to be given.
By default the column name "First Scan" is used (used in MS Annika). For
non-standard mgf files a regex pattern for parsing the scan number from the
title can be supplied.
USAGE:
scan_nr_repair_tool.py [-d --data]
                       [-m --mgf]
                       [-c --colname]
                       [-p --pattern]
                       [-o --output]
required arguments:
    -d str, --data str
        Input data file to be fixed in .csv or .xlsx format.
    -m str, --mgf str
        Input spectra file in mgf format.
optional arguments:
    -c str, --colname str
        The column name of the column that holds the scan numbers in the input data file.
        Default: "First Scan"
    -p str, --pattern str
        Regex pattern to be used to get the scan number from the title if it can't be automatically infered.
        Default: "\\.\\d+\\."
    -o str, --output str
        Name of the output file.
        Default: Name of the input data file + "_fixed.xlsx"
    -h, --help
        Show this help message and exit.
    --version
        Show program's version number and exit.
"""

License

Contact