/dMaSIF

Primary LanguageJupyter NotebookOtherNOASSERTION

dMaSIF - Fast end-to-end learning on protein surfaces

Method overview

Abstract

Proteins’ biological functions are defined by the geometric and chemical structure of their 3D molecular surfaces. Recent works have shown that geometric deep learning can be used on mesh-based representations of proteins to identify potential functional sites, such as binding targets for potential drugs. Unfortunately though, the use of meshes as the underlying representation for protein structure has multiple drawbacks including the need to pre-compute the input features and mesh connectivities. This becomes a bottleneck for many important tasks in protein science.

In this paper, we present a new framework for deep learning on protein structures that addresses these limitations. Among the key advantages of our method are the computation and sampling of the molecular surface on-the-fly from the underlying atomic point cloud and a novel efficient geometric convolutional layer. As a result, we are able to process large collections of proteins in an end-to-end fashion, taking as the sole input the raw 3D coordinates and chemical types of their atoms, eliminating the need for any hand-crafted pre-computed features.

To showcase the performance of our approach, we test it on two tasks in the field of protein structural bioinformatics: the identification of interaction sites and the prediction of protein-protein interactions. On both tasks, we achieve state-of-the-art performance with much faster run times and fewer parameters than previous models. These results will considerably ease the deployment of deep learning methods in protein science and open the door for end-to-end differentiable approaches in protein modeling tasks such as function prediction and design.

Hardware requirements

Models have been trained on either a single NVIDIA RTX 2080 Ti or a single Tesla V100 GPU. Time and memory benchmarks were performed on a single Tesla V100.

Software prerequisites

Scripts have been tested using the following two sets of core dependencies:

Dependency First Option Second Option
GCC 7.5.0 8.4.0
CMAKE 3.10.2 3.16.5
CUDA 10.0.130 10.2.89
cuDNN 7.6.4.38 7.6.5.32
Python 3.6.9 3.7.7
PyTorch 1.4.0 1.6.0
PyKeops 1.4 1.4.1
PyTorch Geometric 1.5.0 1.6.1

Code overview

Usage:

  • In order to train models, run main_training.py with the appropriate flags. Available flags and their descriptions can be found in Arguments.py.

  • The command line options needed to reproduce the benchmarks can be found in benchmark_scripts/.

  • To make inference on the testing set using pretrained models, use main_inference.py with the flags that were used for training the models. Note that the --experiment_name flag should be modified to specify the training epoch to use.

Implementation:

  • Our surface generation algorithm, curvature estimation method and quasi-geodesic convolutions are implemented in geometry_processing.py.

  • The definition of the neural network along with surface and input features can be found in model.py. The convolutional layers are implemented in benchmark_models.py.

  • The scripts used to generate the figures of the paper can be found in data_analysis/.

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Reference

Sverrisson, F., Feydy, J., Correia, B. E., & Bronstein, M. M. (2020). Fast end-to-end learning on protein surfaces. bioRxiv.