Chemical space exploration is a major task of the hit-finding process during the pursuit of novel chemical entities. Compared with other screening technologies, computational de novo design has become a popular approach to overcome the limitation of current chemical libraries. Here, we reported a de novo design platform named systemic evolutionary chemical space explorer (SECSE). The platform was conceptually inspired by fragment-based drug design, that miniaturized a “lego-building” process within the pocket of a certain target. The key to virtual hits generation was then turned into a computational search problem. To enhance search and optimization, human intelligence and deep learning were integrated. SECSE has the potential in finding novel and diverse small molecules that are attractive starting points for further validation.
-
Setting up dependencies
python ~=3.9, perl ~=5.32conda create --name secse -c conda-forge parallel tqdm biopandas openbabel chemprop xlrd=2 pandarallel rdkit=2022.09 conda activate secse
-
Installing from source
git clone https://github.com/KeenThera/SECSE.git
-
Setting Environment Variables
export SECSE=/absolute/path/to/SECSE
I'm using AutoDock Vina for docking: (download here)
export VINA=/absolute/path/to/AutoDockVINA
I'm using AutoDock GPU: (adgpu-v1.5.3_linux_ocl_128wi) (download here)
export AUTODOCK_GPU=/absolute/path/to/AutoDockGPU
I'm using Gilde for docking (additional installation & license required):
export SCHRODINGER=/absolute/path/to/SCHRODINGER
I'm using Uni-Dock for docking (need GPU):
compile from Uni-Dock source code (recommand), or download here and addexport UNIDOCK=/absolute/path/to/UNIDOCK
-
Giving execution permissions to the SECSE directory
chmod -R +x /absolute/path/to/SECSE
-
Input fragments: a tab separated .smi file without header. See demo here.
-
Parameters in config file:
[DEFAULT]
- project_code, project identifier, which will be prefixed to each generated molecule ID, type=str
- workdir, working directory, create if not exists, otherwise overwrite, type=str
- fragments, file path to seed fragments, smi format, type=str
- num_per_gen, number of molecules generated each generation, type=int
- seed_per_gen, number of selected seed molecules per generation, default=1000, type=int
- start_gen, number of staring generation, if you want to resume the generation, please specify the 'start_gen' as the number corresponding to the last completed generation in your previous run, default=0, type=int
- num_gen, number of growing generations, the final generation number will be the sum of start_gen and num_gen, type=int
- docking_program, name of docking program, AutoDock-Vina (input vina) or AutoDock-GPU (input autodock-gpu) or Glide (input glide) , default=vina, type=str
- cpu, number of max invoke CPUs, type=int
- gpu, number of max invoke GPU for AutoDock GPU, type=int
- rule_db, path to customized rule in json format, input 0 if use default rule, default=0
[docking]
- target, protein PDBQT if use AutoDock Vina; grid map files descriptor fld file if AutoDock GPU; Grid file if choose Glide, type=str
- RMSD, docking pose RMSD cutoff between children and parent, default=2, type=float
- delta_score, decreased docking score cutoff between children and parent, default=-1.0, type=float
- score_cutoff, default=-9, type=float
Parameters when docking by AutoDock Vina:
- x, Docking box x, type=float
- y, Docking box y, type=float
- z, Docking box z, type=float
- box_size_x, Docking box size x, default=20, type=float
- box_size_y, Docking box size y, default=20, type=float
- box_size_z, Docking box size z, default=20, type=float
[deep learning]
- mode, mode of deep learning modeling, 0: not use, 1: modeling per generation, 2: modeling overall after all the generation, default=0, type=int
- dl_per_gen, top N predicted molecules for docking, default=100, type=int
- dl_score_cutoff, default=-9, type=float
[properties]
- MW, molecular weights cutoff, default=450, type=int
- logP_lower, minimum of logP, default=0.5, type=float
- logP_upper, maximum of logP, default=7, type=float
- chiral_center, maximum of chiral center,default=2, type=int
- heteroatom_ratio, maximum of heteroatom ratio, default=0.35, type=float
- rdkit_rotatable_bound_num, maximum of rotatable bound calculated from rdkit.rdMolDescriptors.CalcNumRotatableBonds, default=5, type=int
- keen_rotatable_bound_num, maximum of rotatable bound defined by KEEN ( SMARTS: "[C^3!D1;!$(C(F)(F)F)]-!@[!Br!F!Cl!I!H3&!$(#)!D1;!$(!Br!F!Cl!I(F)F)]"), default=3, type=int
- rigid_body_num, maximum of rigid body defined by KEEN ( SMARTS: "[C^3!D1;!$(C(F)(F)F);!R;!$(C=O(N));!$(NC(=O));!$(C(=O)O);!$(C(=O)O)]-!@[!Br!F!Cl!I!H3&!$(#)! D1;!$(!Br!F!Cl!I(F)F);!R;!$(C=O([N,O]));!$(NC(=O));!$(C(=O)O)]"), default=2, type=int
- HBD, maximum of hydrogen bond donor calculated by rdkit.rdMolDescriptors.CalcNumHBD, default=5, type=int
- HBA, maximum of hydrogen bond acceptor calculated by rdkit.rdMolDescriptors.CalcNumHBA, default=10, type=int
- TPSA, maximum of topological polar surface area calculated by rdkit.Chem.Descriptors.TPSA, default=200, type=float
- lipinski_violation, maximum of violation of Lipinski rule of five calculated by RDKit, default=1, default=1, type=int
- QED, QED (calculated by rdkit.Chem.QED.qed) cutoff value, default=0.5, type=float
- max_ring_size, maximum of ring size, default=7, type=int
- max_ring_system_size, maximum of ring system member size in one ring system, default=3, type=int
- ring_system_count, maximum of seperated ring system count, default=4, type=int
- bridged_site_count, maximum of bridged ring site count, default=2, type=int
- spiro_site_count, maximum of spiro ring site count, default=1, type=int
- fused_site_count, maximum of fused ring site count, default=3, type=int
- rdkit_sa_score, synthetic accessibility score (calculated by RDKit) cutoff, default=5, type=float
- substructure_filter, files containing the customized unwanted substructure SMARTS in "*.xls" format, set the
value to 0 if you do not have any additional unwanted substructure. PANIS already includes as default. The file
should include columns for
Pattern
,ID
, andMax
, where theID
should be unique for each SMARTS. You can refer to the example file subtructure_filter_demo.xls, default=0, type=string
Config file of a demo case phgdh_demo_vina.ini
Customized rule json template rules.json. Rule ID should be in the form G-001-XXXX, like G-001-0001, G-001-0002, G-001-0003 ... -
Run SECSE
python $SECSE/run_secse.py --config /absolute/path/to/config
Please input the absolute path of the config file here. -
Output files
- merged_docked_best_timestamp_with_grow_path.csv: selected molecules and growing path
- selected.sdf: 3D conformers of all selected molecules
GNU Parallel installation
- CentOS / RHEL
sudo yum install parallel
- Ubuntu / Debian
sudo apt-get install parallel
- From source: https://www.gnu.org/software/parallel/
python ~=3.9, perl ~=5.32
numpy~=1.24.3, pandas~=1.3.3, xlrd~=2.0.1, pandarallel~=1.5.2, tqdm~=4.65.0, biopandas~=0.4.1, openbabel~=3.1.1, rdkit~ =2022.09, chemprop~=1.5.2, pytorch~=2.0.0+cu117
Linux server with CPUs only also works.
Lu, C.; Liu, S.; Shi, W.; Yu, J.; Zhou, Z.; Zhang, X.; Lu, X.; Cai, F.; Xia, N.; Wang, Y. Systemic Evolutionary Chemical Space Exploration For Drug Discovery. J Cheminform 14, 19 (2022).
https://doi.org/10.1186/s13321-022-00598-4
SECSE is released under Apache License, Version 2.0.
The project is being actively developed, if you have any questions or suggestions, please contact: lu_chong@keenthera.com