/ligand-classification

Project examing sparse deep learning architectures for ligand classification.

Primary LanguageJupyter Notebook

Streamlit - Demo bioRxiv - Preprint Zenodo - Data example workflow

Ligand Identification in CryoEM and X-ray Maps Using Deep Learning

Jacek Karolczak, Anna Przybyłowska, Konrad Szewczyk, Witold Taisner, John M. Heumann, Michael H.B. Stowell, Michał Nowicki, Dariusz Brzezinski

Accurately identifying ligands plays a crucial role in structure-guided drug design. Based on density maps from X-ray diffraction or cryogenic-sample electron microscopy (cryoEM), scientists verify whether small-molecule ligands bind to active sites. However, the interpretation of density maps is challenging, and cognitive bias can sometimes mislead investigators into modeling fictitious compounds. Ligand identification can be aided by automatic methods, but existing approaches are available only for X-ray diffraction. Here, we propose to identify ligands using a deep learning approach that treats density maps as 3D point clouds. We show that the proposed model is on par with existing methods for X-ray crystallography while also being applicable to cryoEM density maps. Our study demonstrates that electron density map fragments can be used to train models that can be applied to cryoEM structures, but also highlights challenges associated with the standardization of electron microscopy maps and the quality assessment of cryoEM ligands.

In the repository, we provide the code for the experiments conducted in the paper, including model implementations and transformations for generating datasets. To reproduce the results, use scripts from the scripts directory. Configuration files for the experiments are available in the cfg directory.

We provide weights of the model trained on cryoEM and X-ray crystallography as model.pt (link).


Presented below are schematics of deep learning architectures used to predict ligands:

  1. The RiConv++ architecture with five enhanced rotation invariant convolution (RIConv++) layers.
  2. The MinkLoc3Dv2 architecture utilizing information from a pyramid of three feature maps with different receptive fields.
  3. The TransLoc3D architecture built from four modules: 3D Sparse Convolution, Adaptive Receptive Field, External Transformer, and NetVLAD.

All the architectures were modified to take as input the same sample of 2000 voxels (or less in case of ligands is described by default by smaller number of voxels) and output the probability scores of all the studied 219 ligand groups.

Deep Learning Architectures Schematics


Here are some snapshots of ligand identifications made by the proposed MinkLoc3Dv2 model.

  • (A–D) Examples of correctly predicted X-ray ligands.
  • (E) Uridine-5’-diphosphate (UDP) misclassified as uridine (URI, black dashed frame).
  • (F–I) Examples of correctly predicted cryoEM ligands.
  • (J) Heme A (HEM) misclassified as a rare ligand due to incorrect density thresholding.

Blobs Identified by MinkLoc3Dv2

Each ligand is labeled by its Chemical Component Dictionary ID, structure resolution, and (in parentheses) the PDB ID, chain, and residue number. X-ray diffraction ligands shown in green mesh based on Fo-Fc maps contoured at 2.8σ calculated after removal of solvent and other small molecules (including the ligand) from the model. CryoEM ligands depicted in pink mesh based on difference maps contoured according to the proposed automatic density thresholding method (13.642, 3.385, 17.997, 7.850, and 5.613 V for panels F–J, respectively). The white mesh in panel J shows a manually selected contour threshold of 11.000 V. Atomic coordinates were taken from the PDB deposits.


Demo

The model trained on blobs from cryoEM and X-ray crystallography can be tested without the need to install anything. The model is deployed as a Streamlit app under the link ligands.cs.put.poznan.pl.


API

The Ligand Classification API provides endpoints for classifying ligands from 3D point cloud data using a model trained on all the data mentioned in the paper, including blobs from cryoEM and X-ray crystallography. The API supports various file formats for point cloud input and returns the top 10 predicted ligand classes along with their probabilities.

Each user is limited to one request per second.

Base URL: http://ligands.cs.put.poznan.pl

Endpoints

1. Health Check - [GET] http://ligands.cs.put.poznan.pl/api

Checks if the Ligand Classification API is operational.

Responses
  • 200: Success

    Example response:

    Ligand Classification API is up and running!
    
  • 500: Server Error

    Error Details
    {
      "error": "Internal Server Error"
    }

2. Classify Ligand - [POST] http://ligands.cs.put.poznan.pl/api/predict

Classifies the uploaded 3D point cloud data and returns the top 10 most likely ligand classes along with their respective probabilities.

Request Body
  • file (string, binary, required):
    Supported formats: .npy, .npz, .pts, .xyz, .txt, .csv

    • .npy, .npz:
      • dense three dimensional numpy array
    • .xyz, .txt:
      • without any header
      • each line describe a voxel following the pattern x y z density
    • .pts
      • the first line contains information about number of points (lines)
      • each line describe a voxel following the pattern x y z density
    • .csv
      • files with headers and headerless are supported
      • each line describe a voxel following the pattern x, y, z, density Example: example.npy
  • rescale_cryoem (string, optional):
    Indicates whether to rescale the cryoEM data. Accepts "true" or "false".
    Example: "false"

  • resolution (number, optional):
    The resolution value for cryoEM data rescaling. Required if rescale_cryoem is "true".
    Example: 1.5

Responses
  • 200: Successfully classified ligand

    Example response:

    {
      "predictions": [
        {
          "Class": "ATP",
          "Probability": 0.82
        },
        {
          "Class": "GTP",
          "Probability": 0.07
        },
        ...
      ]
    }
  • 400: Bad Request (Multiple Possible Errors)

    • No file part in the request
    Error Details
    {
      "error": "No file part in the request"
    }
    • Unsupported file format

      Error Details
      {
        "error": "Unsupported file format"
      }
    • Missing resolution when rescale_cryoem is true

      Error Details
      {
        "error": "No resolution part in the request"
      }
  • 500: Internal Server Error

    Error Details
    {
      "error": "An unexpected error occurred"
    }

Environment setup

Docker

To simplify the setup and ensure consistency, we provide a Docker configuration that includes all necessary dependencies.

Prerequisites

Ensure you have the following installed:

Steps to Start

  1. Clone this repository.
  2. Set the necessary permissions: sudo chmod 744 ./start.sh ./stop.sh
  3. Configure the environment by editing the docker/.env file:
    • Adjust PYTORCH, CUDA, and CUDNN settings if needed (for GPU use).
    • Set the DATA_PATH to point to your data directory. Default is ../../data/.
  4. Start the container:
    • For GPU use: ./start.sh
    • For CPU use: ./start.sh cpu
  5. To stop the container:
    • For GPU use: ./stop.sh
    • For CPU use: ./stop.sh cpu

Data

All the data necessary to reproduce results is available at Zenodo.

Repository with code for extracting ligands from CryoEM difference maps is a submodule of this repository, but can be also found here.

Additionally, the preprocessed data (uniformly sampled and max pooled 2000 points per ligand) that were used to train the final model are available here.

Citation

@article {Karolczak2024.08.27.610022,
	author = {Karolczak, Jacek and Przyby{\l}owska, Anna and Szewczyk, Konrad and Taisner, Witold and Heumann, John M. and Stowell, Michael H.B. and Nowicki, Micha{\l} and Brzezinski, Dariusz},
	title = {Ligand Identification using Deep Learning},
	elocation-id = {2024.08.27.610022},
	year = {2024},
	doi = {10.1101/2024.08.27.610022},
	publisher = {Cold Spring Harbor Laboratory},
	abstract = {Motivation Accurately identifying ligands plays a crucial role in the process of structure-guided drug design. Based on density maps from X-ray diffraction or cryogenic-sample electron microscopy (cryoEM), scientists verify whether small-molecule ligands bind to active sites of interest. However, the interpretation of density maps is challenging, and cognitive bias can sometimes mislead investigators into modeling fictitious compounds. Ligand identification can be aided by automatic methods, but existing approaches are available only for X-ray diffraction and are based on iterative fitting or feature-engineered machine learning rather than end-to-end deep learning.Results Here, we propose to identify ligands using a deep learning approach that treats density maps as 3D point clouds. We show that the proposed model is on par with existing machine learning methods for X-ray crystallography while also being applicable to cryoEM density maps. Our study demonstrates that electron density map fragments can be used to train models that can be applied to cryoEM structures, but also highlights challenges associated with the standardization of electron microscopy maps and the quality assessment of cryoEM ligands.Availability Code and model weights are available on GitHub at https://github.com/jkarolczak/ligands-classification. Datasets used for training and testing are hosted at Zenodo: 10.5281/zenodo.10908325.Contact dariusz.brzezinski{at}cs.put.poznan.plCompeting Interest StatementThe authors have declared no competing interest.},
	URL = {https://www.biorxiv.org/content/early/2024/08/28/2024.08.27.610022},
	eprint = {https://www.biorxiv.org/content/early/2024/08/28/2024.08.27.610022.full.pdf},
	journal = {bioRxiv}
}