- MolProPred
- Message Passing Graph Neural Network
- Molecular Graph Neural Networks
- Introduction to SchNet
This project entails an exploration of Kaggle's competition, centered on the prediction of molecular properties. Furthermore, we are gaining expertise in employing the SchNet graph neural network package to derive these properties. The primary program is modeled after the 22nd solution on the leaderboard, with the aim of enhancing our understanding. Our ranking stands at approximately 1380 out of 2690 in comparison to all participating entities.
-
The nomenclature of the project is assigned by chatGPT. (prompt: Please give me an ultra-cool project name, the project is for coping with an assignment given by a course 'machine learning for physicist', there are 3 team members, and the project is about predicting molecular properties.)
-
Just for fun and academic credit.
-
The AI extensions utilized in this project encompassed a range of tools, including Genie-AI, Github Copilot for code testing, New Bing for literature search, Monica for writing assistance, chatgpt_academic for recreational purposes.
Graph Neural Network (GNN) is a type of neural network that can operate on graph data. It has become increasingly popular in recent years due to its ability to model complex relationships between entities in a graph. Message passing is a fundamental operation in GNN, which enables the network to propagate information between nodes in the graph.
A graph is a data structure that consists of a set of nodes (or vertices) and a set of edges connecting these nodes. In Message Passing GNNs, the first step is to initialize the graph. The graph can be represented as an adjacency matrix
In addition to the adjacency matrix or edge list, we also need to initialize the node features. Each node has a feature vector that describes its properties. For example, in a social network analysis task, the node features can be the age, gender, and occupation of each user. In a drug discovery task, the node features can be the molecular properties of each atom in a molecule.
After initializing the graph and node features, we can start the message-passing process. The message-passing function is the core component of Message Passing GNNs. It aggregates information from neighboring nodes and updates the node features iteratively.
The aggregation operation can be formulated as follows, which aggregates messages from neighbor vertex:
$$ \vec{m}i^{(k)}=\sum{u_j \in \mathcal{N}\left(u_i\right)} \phi_m^{(k)}\left(\vec{h}_i^{(k-1)}, \vec{h}j^{(k-1)}, \vec{a}{i j}\right). $$
then the node feature is updated,
the functions denoted by
After several rounds of message passing, the final step is to perform a readout operation to obtain a graph-level representation. The readout operation aggregates the node features into a single vector, which represents the entire graph,
There are various methods for readout operation, including sum pooling, max pooling, and attention-based pooling. The choice of readout operation depends on the task at hand and the characteristics of the graph.
The simplest way to represent a molecule is as a 1D string of characters. Each character represents an atom, and the order of the characters represents the order in which the atoms are connected. For example, the string "C1=CC=CC=C1" represents benzene, where each "C" represents a carbon atom and each "=" represents a double bond between carbon atoms.
Algorithms utilizing such methodologies include:
- Simplified molecular-input line-entry system (SMILES)
- SMILES arbitrary target specification (SMARTS)
- Self-referencing embedded strings (SELFIES)
A more powerful representation of a molecule is as a 2D graph. In this representation, each atom is a node in the graph, and each bond between atoms is an edge. The type of bond (single, double, etc.) can be represented as a label on the edge. One common way to represent molecules as 2D graphs is through the use of adjacency matrices. An adjacency matrix is a square matrix that represents the connections between nodes in a graph. In the case of molecules, the adjacency matrix represents the bonds between atoms.
2D graphical representations solely encompass the topological characteristics, yet their ability to convey properties such as distance, angle, and other attributes in 3D Euclidean space remains restricted.
In addition to representing molecules as 2D graphs, it is also possible to represent them in 3D Euclidean space. This can be useful for modeling the spatial arrangement of atoms in a molecule.
The SchNet algorithm utilized in our project incorporates meticulously crafted layers that adeptly capture local correlations and facilitate atom-wise updating through continuous filter convolution. By effectively encoding 3D distance information to molecular GNN, SchNet has served as a source of inspiration for numerous subsequent works.
SchNet: model atomistic systems by making use of continuous-filter convolutional layers(model interaction between atoms).
Allow to model complex atomic interactions.
- Predict potential energy surfaces.
- Speed up the exploration of chemical space.
Consider fundamental symmetries of atomistic systems.
- Rotational and translational invariance as well as invariance to atom indexing.
SchNet is a variant of the earlier proposed Deep Tensor Neural Networks(DTNN).
- DTNN: interactions are modeled by tensor layers, i.e., atom representations and interatomic distances are combined using a parameter tensor.
- SchNet: makes use of continuous-filter convolutions with filter-generating networks to model the interaction term.
At each layer, the molecule is represented atom-wise analogous to pixels in an image.
Interactions between atoms are modeled by the three interaction blocks.
The final prediction is obtained after atom-wise updates of the feature representation and pooling of the resulting atom-wise energy.
Nuclear charges
Positions $ R=(\mathbf r_1,…,\mathbf r_n)$
The atoms are described by a tuple of features
-
$\mathbf x_i^l \inℝ^F$ ,$F$ : number of feature maps -
$n$ : number of atoms -
$l$ : current layer$\mathbf x_i^0$ is initialized using an embedding dependent on the atom type$Z_i$ :
$$ \mathbf{x}{i}^{0}=\mathbf{a}{Z_i}. $$
The atom type embeddings
Are dense layers.
Applied separately to the representations
$$ \mathbf{x}{i}^{l+1}=W^l\mathbf{x}{i}^{l}+\mathbf{b}^l, $$
- Weights
$W^l$ and biases$\mathbf b^l$ are shared across atoms. - The architecture remains scalable with respect to the number of atoms.
These layers are responsible for the recombination of feature maps.
Updating the atomic representations based on pair-wise interactions with the surrounding atoms.
Continuous-filter convolutional layers, a generalization of the discrete convolutional layers commonly used.
- Atoms are located at arbitrary positions.
Model the filters continuously with a filter-generation neural network
$$ \begin{aligned} \mathbf{x}_{i}^{l+1}&=\left( X^l*W^l \right) i\ &=\mathbf{x}{j}^{l}\circ W^l\left( \mathbf{r}_j-\mathbf{r}_i \right) . \end{aligned} $$
Activation functions: shifted softplus:
- ssp(0) = 0.
- Improves the convergence of the network while having infinite order of continuity.
Obtain:
- smooth potential energy surfaces.
- force fields.
- second derivatives that are required for training with forces as well as the calculation of vibrational modes.
Determines how interactions between atoms are modeled.
Constrain the model and include chemical knowledge.
Input: a fully-connected neural network that takes the vector pointing from atom
Rotationally invariant: requirements for modeling molecular energies. Obtained by using interatomic distances:
Filters would be highly correlated: a neural network after initialization is close to linear.
Expand the distances in a basis of Gaussians:
-
$\mu_k$ : chosen on a uniform grid between zero and the distance cutoff.
The number of Gaussians and the hyper parameter
Each atom-wise feature vector
Given a filter $\tilde{W}^l(\boldsymbol{r}{jb}-\boldsymbol{r}{ia})$ over all atoms with $\lVert \boldsymbol{r}{jb}-\boldsymbol{r}{ia} \rVert < r_{\text{cut}}$:
$$ \begin{aligned} \mathbf{x}{i}^{l+1}&=\mathbf{x}{im}^{l+1}=\frac{1}{n_{\text{neighbors,,}}}\sum_{j,n}{\mathbf{x}{jn}^{l}}\circ \tilde{W}^l\left( \mathbf{r}{jn}-\mathbf{r}{im} \right)\ &=\frac{1}{n{\text{neighbors,,}}}\sum_j{\mathbf{x}{j}^{l}}\circ \underset{W}{\underbrace{\left( \sum_n{\tilde{W}}^l\left( \mathbf{r}{jn}-\mathbf{r}_{im} \right) \right) }},\ \end{aligned} $$
-
$a, b$ : unit cell.
More stable when normalizing the filter response
Compute atom-wise contributions
Calculate the final prediction
As SchNet yields rotationally invariant energy predictions, the force predictions are rotationally equivariant by construction.
Predicting atomic forces:
Train SchNet for each property target
Train energies and forces with combined loss:
$$ \left.\ell\left(\left(\hat{E}, \hat{\mathbf{F}}_1, \ldots, \hat{\mathbf{F}}n\right)\right),\left(E, \mathbf{F}1, \ldots, \mathbf{F}n\right)\right)=\rho|E-\hat{E}|^2+\frac{1}{n{\text {atoms }}} \sum{i=0}^{n{\text {atoms }}}\left|\mathbf{F}_i-\left(-\frac{\partial \hat{E}}{\partial \mathbf{R}_i}\right)\right|^2. $$
-
$\rho$ : trade-off between energy and force loss.
In each experiment, we split the data into a training set of given size
Remaining data is used for computing the test errors.