Gradientless, integrity-validating molecular optimization framework

A transformer-based molecular optimization framework. Optimize any molecule using an arbitrary scoring function while maintaining its integrity and purpose.

Framework overview

The optimization framework consists of three elements - attention loss, latent space distance, and a user-defined scoring function.

Attention loss helps us in maintaining integrity of the molecule. Chemformer's attention heads penalize modifications which lead to an invalid molecule.
Latent space distance helps us in sticking to the purpose of the original molecule - it measures the semantic difference between our modified molecule and the sourceo ne.
User-defined objective can be any function which accepts SMILES and returns a scalar.

Architecture

Input: SMILES embedded using a Chemformer (Chemformer embedding.ipynb)
Candidate generation: handles reranking of the candidates, chooses the best of them, and runs them through the benchmark. (Optimize molecule.Sampler - currently only a greedy sampler is supported, feel free to add more!)
Attention loss: prediction of the most fitting modifications (Optimize molecule.MolecularOptimizer.get_transformer_ll)
Latent space distance: cosine distance between embeddings of the source molecule and the candidate. Currently it's intertwined with scoring function, sorry :/
Scoring: whatever you want it to be, get crazy. Optimize molecule notebook shows two examples.
- A chance that a molecule will be a Mu-receptor antagonist (score_fn), based on a neural network predictions (thanks to dr Sabina Podlewska for providing data)
- Possibility of DILI-related injury, based on an XGBoost algorithm (gradient-less! Data from CAMDA challenge 2020)

Results

Sample optimizations of a molecule to be a better Mu-receptor antagonist (prediction format: [binding probability, 1-antagonist probability])

DILI chance optimization (don't pay too much attention to this one, we had a poor classifier)

wwydmanski/molecular-optimization

Gradientless, integrity-validating molecular optimization framework

Framework overview

Architecture

Results

Scoring functions AUC