1. Project Overview

This project uses a protein-protein interaction interactome (PathLinker_2018_human-ppi-weighted-cap0_75.txt) found here to perform biological network analysis. The results of the analysis are then displayed on a web application. All resulting text or csv files are saved in the folder named exportedfiles.

ppi-analysis-project.mp4

2. Project Dependencies

  1. NetworkX
  2. Pandas
  3. Matplotlib
  4. Numpy
  5. Unipressed
  6. Flask

3. Project Description

This analysis consisted of the following

  • Constructing the weighted graph of a subset of the interactome (100 samples) weightedGraphonly100

  • Listing the highest 100 proteins in degree centrality in a .csv file (Top100DegreeCentralityProteins.csv)

  • Finding the shortest unweighted path between two proteins Q8TBF4 and P55157 using Dijkstra's algorithm shortestpath-unweighted

  • Finding the shortest weighted path between two proteins Q8TBF4 and P55157 using Dijkstra's algorithm shortestpath-weighted

  • Visualizing all shortest paths using Dijkstra's algorithm and listing the resulting paths in a .txt file (allpaths.txt) allshortestpaths

  • Listing all directly connected proteins to the protein with highest degree centrality P05067 in a .txt file (protein_connections.txt) where the first column represents the connected protein and the second represents the weight of the interaction. The in and out degrees of the same protein represent the last two rows.

  • Ranking a set of proteins according to their in-degree in a descending order and saving the ranked list in a .txt file (sorted_setOfProteins.txt)

  • Plotting a histogram of the in-degree of a set of proteins
    histogram

  • Mapping a list of UniProt IDs to their corresponding gene names in a .txt file (UniProtIDtoGeneName.txt)

  • Constructing the unweighted graph of a subset of the interactome (2500 samples) unweightedGraph

  • Saving the adjacency matrix of the unweighted subset in a .txt file (adjacency1.txt)

  • Constructing and visualizing the minimum spanning tree using Kruskal's algorithm minimumspanningtree

4. Results

This analysis proved and visualized the centrality of PPIs around some more influential proteins. It has also shown that the proteins of highest degree centrality play a vital role in achieving physiological functions, which require heavy interactions with other proteins.