/pph2_and_rank

Ranks genes according to PolyPhen2 scores for a list of mutations

Primary LanguagePythonMIT LicenseMIT

pph2_and_rank.py

Submits a list of genes with SNPs to PolyPhen2 and orders them by a score measuring the overall deleteriousness of the mutations, normalized for gene product length in amino acids (AAs).

License

Written by Ted Pak for the Roth Laboratory.

Released under an MIT license; please see MIT-LICENSE.txt.

BeautifulSoup, an included library, is released under a new BSD-style license.
Please see its source code or website for more details.

Requirements

Any modern *nix with Python >=2.6 installed. This includes Mac OS X.

Usage

$ ./pph2_and_rank.py [-h|--help] [--hg19] [--humdiv] [-q|--quiet] [-s|--sid SID]
       [-o|--output OUTPUT] MUT_LIST_1 [MUT_LIST_2 ...]

Options

  • -h or --help displays a simple usage message.

  • --hg19 tells PolyPhen2 to use build 19 of the human genome assembly instead of build 18.

  • --humdiv tells PolyPhen2 to use the HumDiv PolyPhen2 classifier model instead of HumVar.

  • -o OUTPUT or --output OUTPUT will cause output to be written to the file named by OUTPUT instead of sending it to standard output.

  • -q or --quiet will suppress the status updates normally sent to standard error.

  • -s SID or --sid SID allows you to specify a previous GGI session ID that the script will re-attach to. In this case, all the MUT_LIST_$N inputs are ignored. Use this when a previous script has already submitted information to PolyPhen2 but the script was prematurely terminated. The SID is displayed in the status messages that this script prints to standard error once it has been received from PolyPhen's GGI status page.

Input

MUT_LIST_1, MUT_LIST_2 and so on must be files where each line follows the form acc_id mut_1 mut_2 mut_3 ... and each mut_$n is the concatenation of the canonical AA, the position in the AA sequence, and the mutated AA. For example:

Q92889 I706T E875G
Q6ZSZ5 L763F
Q32M45 Q390P E463Q A549T E570K P686H

If any one of the MUT_LIST_$N arguments is set to - (hyphen), the list is read from standard input until EOF is encountered. Therefore, you can also do:

$ cat mutations_1.txt mutations_2.txt | ./pph2_and_rank.py -

Output

The output is a tab-delimited table of genes sorted by their score, where the score is produced by summing the PolyPhen2 scores, dividing by the AA sequence length of the gene product, and multiplying by (the completely arbitrary constant) 1000. It is sent to standard output, with progress messages sent to standard error.

To save the output from the script, either redirect standard output:

$ ./pph2_and_rank.py mutations.txt > output.txt

or use the -o or --output option:

$ ./pph2_and_rank.py -o output.txt mutations.txt