/graphein

Protein Graph Library

Primary LanguageJupyter NotebookMIT LicenseMIT

DOI:10.1101/2020.07.15.204701 Project Status: Active – The project has reached a stable, usable state and is being actively developed. Documentation Status Gitter chat License: MIT banner

Documentation | Paper

Protein Graph Library

This package provides functionality for producing a number of types of graph-based representations of proteins. We provide compatibility with standard formats, as well as graph objects designed for ease of use with popular deep learning libraries.

What's New?

  • Protein Graph Visualisation!
  • RNA Graph Construction from Dotbracket notation

Example usage

Creating a Protein Graph

from graphein.construct_graphs import  ProteinGraph

# Initialise ProteinGraph class
pg = ProteinGraph(granularity='CA', insertions=False, keep_hets=True,
                  node_featuriser='meiler', get_contacts_path='/Users/arianjamasb/github/getcontacts',
                  pdb_dir='examples/pdbs/',
                  contacts_dir='examples/contacts/',
                  exclude_waters=True, covalent_bonds=False, include_ss=True)

# Create residue-level graphs. Chain selection is either 'all' or a list e.g. ['A', 'B', 'D'] specifying the polypeptide chains to capture

# DGLGraph From PDB Accession Number
graph = pg.dgl_graph_from_pdb_code('3eiy', chain_selection='all')
# DGLGraph From PDB file
graph = pg.dgl_graph_from_pdb_file(file_path='examples/pdbs/pdb3eiy.pdb', contact_file='examples/contacts/3eiy_contacts.tsv', chain_selection='all')

# Create atom-level graphs
graph = pg._make_atom_graph(pdb_code='3eiy', graph_type='bigraph')

Creating a Protein Mesh

from graphein.construct_meshes import  ProteinMesh
# Initialise ProteinMesh class
pm = ProteinMesh()

# Pytorch3D Mesh Object from PDB Code
verts, faces, aux = pm.create_mesh(pdb_code='3eiy', out_dir='examples/meshes/')
# Pytorch3D Mesh Object from PDB File
verts, faces, aux = pm.create_mesh(pdb_file='examples/pdbs/pdb3eiy.pdb')

Creating an RNA Graph

from graphein.construct_graphs import RNAGraph
# Initialise RNAGraph Constructor
rg = RNAGraph()
# Build the graph from a dotbracket & optional sequence
rna = rg.dgl_graph_from_dotbracket('..(((((..(((...)))..)))))...', sequence='UUGGAGUACACAACCUGUACACUCUUUC')

Parameters

Graphs can be constructed according to walks through the graph in the figure below. banner

granularity: {'CA', 'CB', 'atom'} - specifies node-level granularity of graph
insertions: bool - keep atoms with multiple insertion positions
keep_hets: bool - keep hetatoms
node_featuriser: {'meiler', 'kidera'} low-dimensional embeddings of AA physico-chemical properties
pdb_dir: path to pdb files
contacts_dir: path to contact files generated by get_contacts
get_contacts_path: path to GetContacts installation
exclude_waters: bool - retain structural waters
covalent_bonds: bool - maintain covalent bond edges or just use intramolecular interactions
include_ss: bool - calculate protein SS and surface features using DSSP and assign them as node features

Installation

  1. Create env:

    conda create --name graphein python=3.7
    conda activate graphein
  2. Install GetContacts

    Installation Instructions

    MacOS

     # Install get_contact_ticc.py dependencies
     $ conda install scipy numpy scikit-learn matplotlib pandas cython seaborn
     $ pip install ticc==0.1.4
      
     # Install vmd-python dependencies
     $ conda install netcdf4 numpy pandas seaborn expat tk=8.5  # Alternatively use pip
     $ brew install netcdf pyqt # Assumes https://brew.sh/ is installed
    
     # Install vmd-python library
     $ conda install -c conda-forge vmd-python
    
     # Set up getcontacts library
     $ git clone https://github.com/getcontacts/getcontacts.git
     $ echo "export PATH=`pwd`/getcontacts:\$PATH" >> ~/.bash_profile
     $ source ~/.bash_profile
    
     # Test installation
     $ cd getcontacts/example/5xnd
     $ get_dynamic_contacts.py --topology 5xnd_topology.pdb \
                               --trajectory 5xnd_trajectory.dcd \
                               --itypes hb \
                               --output 5xnd_hbonds.tsv

    Linux

       
      # Make sure you have git and conda installed and then run
    
      # Install get_contact_ticc.py dependencies
      conda install scipy numpy scikit-learn matplotlib pandas cython
      pip install ticc==0.1.4
      
      # Set up vmd-python library
      conda install -c https://conda.anaconda.org/rbetz vmd-python
      
      # Set up getcontacts library
      git clone https://github.com/getcontacts/getcontacts.git
      echo "export PATH=`pwd`/getcontacts:\$PATH" >> ~/.bashrc
      source ~/.bashrc
    
    
  3. Install Biopython & RDKit:

    N.B. DGLLife requires rdkit==2018.09.3

    conda install biopython
    conda install -c conda-forge rdkit==2018.09.3
  4. Install DSSP:

    We use DSSP for computing some protein features

    $ conda install -c salilab dssp
  5. Install PyTorch, DGL and DGL LifeSci:

    N.B. Make sure to install appropriate version for your CUDA version

    # Install PyTorch: MacOS
    $ conda install pytorch torchvision -c pytorch                      # Only CPU Build
    
    # Install PyTorch: Linux
    $ conda install pytorch torchvision cpuonly -c pytorch              # For CPU Build
    $ conda install pytorch torchvision cudatoolkit=9.2 -c pytorch      # For CUDA 9.2 Build
    $ conda install pytorch torchvision cudatoolkit=10.1 -c pytorch     # For CUDA 10.1 Build
    $ conda install pytorch torchvision cudatoolkit=10.2 -c pytorch     # For CUDA 10.2 Build
    
    # Install DGL. N.B. We require 0.4.3 until compatibility with DGL 0.5.0+ is implemented
    $ pip install dgl==0.4.3
    
    # Install DGL LifeSci
    $ conda install -c dglteam dgllife
  6. Install PyTorch Geometric:

    $ pip install torch-scatter==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-${TORCH}.html
    $ pip install torch-sparse==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-${TORCH}.html
    $ pip install torch-cluster==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-${TORCH}.html
    $ pip install torch-spline-conv==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-${TORCH}.html
    $ pip install torch-geometric

    Where ${CUDA} and ${TORCH} should be replaced by your specific CUDA version (cpu, cu92, cu101, cu102) and PyTorch version (1.4.0, 1.5.0, 1.6.0), respectively

    N.B. Follow the instructions in the Torch-Geometric Docs to install the versions appropriate to your CUDA version.

  7. Install PyMol and IPyMol

    $ conda install -c schrodinger pymol
    $ git clone https://github.com/cxhernandez/ipymol
    $ cd ipymol
    $ pip install . 

    N.B. The PyPi package seems to be behind the github repo. We require functionality that is not present in the PyPi package in order to construct meshes.

  8. Install graphein:

    $ git clone https://www.github.com/a-r-j/graphein
    $ cd graphein
    $ pip install -e .

Citing Graphein

Please consider citing graphein if it proves useful in your work.

@article{Jamasb2020,
  doi = {10.1101/2020.07.15.204701},
  url = {https://doi.org/10.1101/2020.07.15.204701},
  year = {2020},
  month = jul,
  publisher = {Cold Spring Harbor Laboratory},
  author = {Arian Rokkum Jamasb and Pietro Lio and Tom Blundell},
  title = {Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Protein Structures}
}