Find protein complexes from high-throughput AP-MS data
Python implementation for Markov Random Field model of AP-MS experiments.
Rungsarityotin, W., Krause, R., Schoedl, A., Schliep, A. (2007) BMC Bioinformatics 8:482.
Require a Python >= 3.5 installation. Since it depends on some packages that can be tricky to install using pip (numba, numpy, ...), we recommend using the Anaconda Python distribution. In case you are creating a new conda environment or using miniconda, please make sure to run conda install anaconda
before running pip, or otherwise the required packages will not be present.
In addition if you obtain the code from the development branch, we now use Tensorflow when GPU is available. The develop branch contains an implementation for large dataset like BioPlex 2.0 or BioPlex 3.0. We now have good results runnning on datasets with fewer than 4,000 proteins as well as on large human datasets with more than 10,000 proteins.
(-f) a CSV file containing a list of bait-preys experiments. The first protein at the beginning of every line is a bait protein. The option (-bp) must be given if your input file is Bioplex 2.0 or Bioplex 3.0.
- The log-ratio of false negative rate and false positive rate (-psi)
- Maximum possible number of clusters (-k), e.g. 600 for gavin2002 dataset and 800 for gavin2006
py main.py -f gavin2002.csv -psi 3.4 -k 500
or
py main.py -bp [Bioplex file] -psi 3.4 -k 500
out.tab: a tab separating file containing a cluster annotation for each protein, one per line.
out.sif: output clustering result in the Simple Interaction File format which you can import into Cytoscape.
Benchmark: S.cerevisiae (MIPS), E.coli (EcoCyC)
Ref 1 (proteins from Gavin2006 PPI network):
MIPS | MRF | ClusterONE | Mix.Bernou. |
---|---|---|---|
Accuracy | 0.504 | 0.498 | 0.450 |
Fraction | 0.643 | 0.713 | 0.713 |
Note: Best MRF runs with parameters
Ref 2 (proteins from Gavin2006 PPI network):
MIPS | MRF (w prior) | ClusterONE | Mix.Bernou. |
---|---|---|---|
Accuracy | 0.495 (0.499) | 0.498 | 0.450 |
Fraction | 0.730 (0.713) | 0.713 | 0.713 |
Ref 3 (MIPS proteins):
MIPS | MRF | ClusterONE | Mix.Bernou. |
---|---|---|---|
Accuracy | 0.396 (0.384) | 0.378 | 0.349 |
Fraction | 0.492 (0.470) | 0.428 | 0.433 |
Note: Best MRF runs with parameters
Ref 4 (proteins from Babu2018 PPI network)
EcoCyc | MRF | ClusterONE | Mix.Bernou. |
---|---|---|---|
Accuracy | 0.457 (0.465) | 0.558 | 0.454 |
Fraction | 0.386 (0.397) | 0.170 | 0.159 |
Ref 5 (EcoCyC proteins)
EcoCyC | MRF | ClusterONE | Mix.Bernou. |
---|---|---|---|
Accuracy | 0.411 (0.372) | 0.318 | 0.467 |
Fraction | 0.157 (0.121) | 0.032 | 0.186 |
Note: parameters of MRF