/GNNforDrugDiscovery

SURA 2021

Primary LanguagePythonMIT LicenseMIT

GNNforDrugDiscovery

SURA 2021

graph.py

Contains all the required operations to be performed on our graph. The various methods in this file are:

  1. read_molecule_from_pubchem_id: This returns networkx.Graph representation of the molecule given its pubchemId.
  2. mol_to_nx: Takes input as a molecule in rdkit.Mol class and gives the networkx.Graph representation of the molecule.
  3. nx_to_mol: Takes input as nx.Graph representation of a molecule and outputs rdkit.Mol representation of the molecule.
  4. read_molecule_nx: Takes input as Canonical smiles representation of the molecule and outputs the nx.Graph representation of the molecule.
  5. read_molecule_mol: Takes input as Canonical smiles representation of the molecule and outputs the rdkit.Mol representation of the molecule.
  6. sequence_on_graph: Takes input as nx.Graph and gives a node ordering and edge ordering of the graph following uniform distribution over the permutation of vertices.
  7. sequence_on_graph_geometric: Takes input as nx.Graph and gives a node ordering and edge ordering of the graph following geometric distribution over the permutation of the vertices. The details of the distribution are provided in a separate section of this file.
  8. construct_graph: Takes input as node ordering and edge ordering , and outputs the nx.Graph representation which has the input ordering as one of its orderings.
  9. valid_molecule: Checks the validity of input graph(as a molecule) in nx.Graph form.
  10. generate_graph_from_sequence:Takes input as node ordering and edge ordering in the format as output by sequence_on_graph, and outputs the nx.Graph representation which has the input ordering as one of its orderings.

data.py

Contains conversion methods from rdkit.mol to torch.geometric.data and nx.Graph to torch.geometric.data and vice versa The various methods and global variables in the file are:

  1. hybridization_types: A dictionary mapping hybridisation type to an index.
  2. chiral_types: A dictionary mapping chiral type to an index.
  3. bond_types: A dictionary mapping bond types to an index.
  4. bond_dirs: A dictionary mapping bond directions to and index.
  5. bond_steroes: A dictionary mapping bond stereo to and index.
  6. torch_geom_to_mol: Takes input as a torch_geometric.data as input and output rdkit.mol representation of the molecule.
  7. nx_to_torch_geom: Given an nx.Graph as input outputs torch_geometric.data representation of the graph.
  8. mol_to_torch_geom: Given an rdkit.Mol as input outputs torch_geometric.data representation of the graph.
  9. read_graphs_from_datase: Reads dataset which is stored locally in directory(Given as input).

model.py

Contains the model implementation The description of various methods and classes are as follows.

  1. MPN(MessagePassing): This is the message passing layer of our Graph Neural network.
  2. Graph_Representation(nn.Module): This is the propagation layer of our GNN.
  3. f_addnode(nn.Module): Neural network that decides probability of adding nodes to the subgraph.
  4. f_add_edge(nn.Module): Neural network that decides probability of adding edges to the subgraph connecting to the recently added node.
  5. f_nodes(nn.Module): Neural network that decides probability of adding edge to the subgraph connecting to the recently added node over the nodes of the rest of the subgraph.
  6. Model(nn.Module): The final GNN model.

Installation

creating venv

sudo apt-get install python3.7-venv
python3.7 -m venv env

torch

soyrce ./env/bin/activate
pip3 install torch
pip3 install -r requirements.txt

RDKIT

Download

wget https://github.com/rdkit/rdkit/archive/Release_XXXX_XX_X.tar.gz

Unzip the tar file. Install(source - https://github.com/cyclica/rdkit-installer)

bash ./install-rdkit ./RDkit -e ./env/

Set PATH for rdkit.

RDBASE="<Path to RDkit folder>"
export RDBASE
LD_LIBRARY_PATH=$RDBASE/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH
PYTHONPATH=$RDBASE:$PYTHONPATH
export PYTHONPATH