MXFP (Macromolecule eXtenden FingerPrint) is a 217D fuzzy fingerprint that encodes atom pairs of 7 pharmacophore groups and is suitable for the comparison of large molecules and scaffold hopping. In a first step, every atom is assigned to one or more of following pharmacophore categories:
- Heavy Atoms (HAC)
- Hydrophobic Atoms (HYD)
- Aromatic Atoms (ARO)
- H-bond Acceptors (HBA)
- H-bond Donors (HBD)
- Positively Charged Atoms (POS)
- Negatively Charged Atoms (NEG)
For each pharmacophore category, all possible atom pairs are determined and converted to a Gaussian of 18% width centered at the atom pair topological (2D) or Euclidean (3D) distance. This Gaussian is then sampled at 31 distance bins
Each of the obtained 31 gaussian values is normalized to the sum of all 31 values,
The sum of normalized gaussian contributions from all atom pairs of one pharmacophore category at distance
The 31 fingerprint bit values from each of the 7 atom categories are finally joined, yielding the 217D fingerprint vector.
You will need following prerequisites:
There are several ways in which you can get started using MXFP:
To obtain a local copy of the project clone the GitHub repository as follows:
git clone https://github.com/reymond-group/mxfp_python.git
To create a ready to use Conda environment download the mxfp.yml file from the repository and run following commands:
conda env create -f mxfp.yml
conda activate mxfp
To install mxfp on an existing Conda environment activate the environment and install mxfp via pip using following commands:
conda activate my_environment
pip install mxfp
In your Python script or Jupyter cell:
- Import the required libraries (RDKit, MXFP)
- Convert SMILES to rdchem.Mol object with RDKit (optional)
- Initialize the MXFPCalculator class
- Calculate MXFP of your molecule either from rchem.Mol object or directly from SMILES
#Import the required libraries (RDKit, MXFP)
from rdkit import Chem
from mxfp import mxfp
#Convert SMILES to rdchem.Mol object with RDKit (optional)
polymyxin_b2_smiles = 'C[C@H]([C@H]1C(=O)NCC[C@@H](C(=O)N[C@H](C(=O)N[C@@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N1)CCN)CCN)CC(C)C)CC2=CC=CC=C2)CCN)NC(=O)[C@H](CCN)NC(=O)[C@H]([C@@H](C)O)NC(=O)[C@H](CCN)NC(=O)CCCCC(C)C)O'
polymyxin_b2_mol = Chem.MolFromSmiles(polymyxin_b2_smiles)
#Initialize the MXFPCalculator class
MXFP = mxfp.MXFPCalculator()
#Calculate MXFP of your molecule either from rchem.Mol object or directly from SMILES
polymyxin_b2_mxfp = MXFP.mxfp_from_mol(polymyxin_b2_mol) #from rdchem.Mol object
polymyxin_b2_mxfp = MXFP.mxfp_from_smiles(polymyxin_b2_smiles) #from SMILES
If you are working with 3D coordinates and wish to use Euclidean atom-pair distances instead of topological distances, initialize the MXFPCalculator class using the dimensionality='3D' parameter. Mind that you will not be able to calculate MXFP from a SMILES string if you use the '3D' option, so you need to provide an rdchem.Mol object.
#Initialize the MXFPCalculator class
MXFP = mxfp.MXFPCalculator(dimensionality='3D')