/graphein

Protein Graph Library

Primary LanguagePythonMIT LicenseMIT

Project Status: Active – The project has reached a stable, usable state and is being actively developed.

Graphein

Protein Graph Library

This package provides functionality for producing a number of types of graph-based representations of proteins. We provide compatibility with standard formats, as well as graph objects designed for ease of use in deep learning.

Example usage

from graphein.construct_graphs import  ProteinGraph

# Initialise ProteinGraph class
pg = ProteinGraph(granularity='CA', insertions=False, keep_hets=True,
                  node_featuriser='meiler', get_contacts_path='/Users/arianjamasb/github/getcontacts',
                  pdb_dir='examples/pdbs/',
                  contacts_dir='examples/contacts/',
                  exclude_waters=True, covalent_bonds=False, include_ss=True)

# Create graph. Chain selection is either 'all' or a list e.g. ['A', 'B', 'D'] specifying the polypeptide chains to capture
graph = pg.dgl_graph('3eiy', chain_selection='all')

Parameters

granularity: {'CA', 'CB', 'atom'} - specifies node-level granularity of graph
insertions: bool - keep atoms with multiple insertion positions
keep_hets: bool - keep hetatoms
node_featuriser: {'meiler', 'kidera'} low-dimensional embeddings of AA physico-chemical properties
pdb_dir: path to pdb files
contacts_dir: path to contact files generated by get_contacts
get_contacts_path: path to GetContacts installation
exclude_waters: bool - retain structural waters
covalent_bonds: bool - maintain covalent bond edges or just use intramolecular interactions
include_ss: bool - calculate protein SS and surface features using DSSP and assign them as node features

Installation

Create env

conda create --name graphein
conda activate graphein
  1. Install vmd-python

    conda install -c conda-forge vmd-python

  2. Install Get Contacts

     # Install get_contact_ticc.py dependencies
     $ conda install scipy numpy scikit-learn matplotlib pandas cython seaborn
     $ pip install ticc==0.1.4
      
     # Install vmd-python dependencies
     $ conda install netcdf4 numpy pandas seaborn  expat tk=8.5  # Alternatively use pip
     $ brew install netcdf pyqt # Assumes https://brew.sh/ is installed
    
     # Set up vmd-python library
     $ git clone https://github.com/Eigenstate/vmd-python.git
     $ cd vmd-python
     $ python setup.py build
     $ python setup.py install
     $ cd ..
    
     # Set up getcontacts library
     $ git clone https://github.com/getcontacts/getcontacts.git
     $ echo "export PATH=`pwd`/getcontacts:\$PATH" >> ~/.bash_profile
     $ source ~/.bash_profile
    
      # Test installation
      $ cd getcontacts/example/5xnd
      $ get_dynamic_contacts.py --topology 5xnd_topology.pdb \
                                --trajectory 5xnd_trajectory.dcd \
                                --itypes hb \
                                --output 5xnd_hbonds.tsv
    
  3. Install DSSP

conda install -c salilab dssp

  1. Install graphein
$ git clone https://www.github.com/a-r-j/graphein
$ cd graphein
$ pip install -e .