Generate a combinatorial library of diverse analogs on core scaffolds by decorating with a library of R-groups
====================================================================================
> 0_canonical_smiles_convert.py
[input: original SMILES file]
[output: RDKit canonical SMILES file]
e.g) > x.py original.smi rdkit_canonical.smi
Goal: A quick way to convert any SMILES strings into RDKit-specific canonical SMILES format.
================================================
> 1_mmpdb_frag_gen.csh
[list of SMILES files]
Goal: Parse chemical library (SMILES) with mmpDB to generate fragments in JSON format (formatted list of lists)
================================================
> 2_parse_mmpdb_frag.py
-list [list of SMILES files for mmpDB fragmentation]
-size [heavy atom count of fragment to be saved]
-out [mmpDB output prefix]
-regex [Optional: Regular Expression of atomtypes to be excluded]
(def: "c|n|s|N|S|O|P|F|Cl|Br|I|Se|Te|B")
e.g) > x.py -list smi.list -size 10 -out mmpdb_frag \
-regex "c|n|s" (don't collect any aromatic fragments)
Goal: Parse the JSON results from mmpDB's fragmentation step originally for Matched Molecular Pair Analysis. Here the results are collected to generate a list of SMILES strings with attachment point designated as [*] and output as CSV file.
Note: [*] are placed at the beginning of the string by mmpDB, may create problems for Chem.CanonSmiles() function when combining R-groups to core molecule. Place the [*] flag behind one atom:
[*]CCCC --> C([*])CCC
[*]C1CC1 --> C1([*])CC1
[*][C@H]12CC1C2 --> [C@H]12([*])CC1C2
================================================
> 3_weld_r_groups.py
-templ [Core Scaffold SMILES with attachment points marked by "x" (eg: CxxCx=C)]
-r [CSV file of R groups with attachment point marked by "[ * ]" generated by 2_parse_mmpdb_frag.py]
-out [Output prefix]
-raw [Optional: Read in pre-generated analog intermediate file (pickle file)]
-unsat_min [Optional: Remove molecule with deg unsaturation less than this (def: None)]
-unsat_max [Optional: Remove molecule with deg unsaturation larger than this (def: None)]
e.g) > x.py -templ core_template.smi -r mmpdb_frag.csv -out combinatorial_analogs \
-raw analog_intermediate.pickle.bz2 \
-unsat_min 2 -unsat_max 5
Goal: Create combinatorial analog library of a core scaffold using a fragment library. While working on the generation, intermediates from the "Combine Core/R-group to generate molecule" step is saved into a pickle.bz2 in case failure in later steps.
- create all permutations of cores with different number of branch pts limited by predefined number of allowed R-group carbon atom
- create all permutations of different number of all R-groups, limited by predefined number of allowed R-group carbon atom
- Combine core scaffolds (with designated branch points) and R-groups (with branch points) into one molecule
- remove molecules with degree of unsaturation not fitting criteria
- remove duplicated SMILES (strings are tautomerized and canonicalized)
- generate all possible stereoisomers (E/Z, diastereomer) and remove meso-isomers (SMILES are tautomerized and canonicalized)
- report results
####################################################################