LigandMMPA: A Python repository from mungpeter

AnalogGenerator

Generate a combinatorial library of diverse analogs on core scaffolds by decorating with a library of R-groups

====================================================================================

>  0_canonical_smiles_convert.py
       [input: original SMILES file]
       [output: RDKit canonical SMILES file]

e.g)  > x.py original.smi rdkit_canonical.smi

Goal: A quick way to convert any SMILES strings into RDKit-specific canonical SMILES format.

================================================

>  1_mmpdb_frag_gen.csh 
      [list of SMILES files]

Goal: Parse chemical library (SMILES) with mmpDB to generate fragments in JSON format (formatted list of lists)

================================================

>   2_parse_mmpdb_frag.py
       -list [list of SMILES files for mmpDB fragmentation]
       -size [heavy atom count of fragment to be saved]
       -out  [mmpDB output prefix]
       -regex [Optional: Regular Expression of atomtypes to be excluded] 
                         (def: "c|n|s|N|S|O|P|F|Cl|Br|I|Se|Te|B")

e.g)  > x.py -list smi.list -size 10 -out mmpdb_frag \
             -regex "c|n|s"      (don't collect any aromatic fragments)

Goal: Parse the JSON results from mmpDB's fragmentation step originally for Matched Molecular Pair Analysis. Here the results are collected to generate a list of SMILES strings with attachment point designated as [*] and output as CSV file.

Note: [*] are placed at the beginning of the string by mmpDB, may create problems for Chem.CanonSmiles() function when combining R-groups to core molecule. Place the [*] flag behind one atom:

      [*]CCCC          -->  C([*])CCC
      [*]C1CC1         -->  C1([*])CC1
      [*][C@H]12CC1C2  -->  [C@H]12([*])CC1C2

================================================

>   3_weld_r_groups.py
       -templ [Core Scaffold SMILES with attachment points marked by "x" (eg: CxxCx=C)]
       -r     [CSV file of R groups with attachment point marked by "[ * ]" generated by 2_parse_mmpdb_frag.py]
       -out   [Output prefix]
       -raw        [Optional: Read in pre-generated analog intermediate file (pickle file)]
       -unsat_min  [Optional: Remove molecule with deg unsaturation less than this (def: None)]
       -unsat_max  [Optional: Remove molecule with deg unsaturation larger than this (def: None)]

e.g) >   x.py -templ core_template.smi -r mmpdb_frag.csv -out combinatorial_analogs \
              -raw analog_intermediate.pickle.bz2 \
              -unsat_min 2 -unsat_max 5

Goal: Create combinatorial analog library of a core scaffold using a fragment library. While working on the generation, intermediates from the "Combine Core/R-group to generate molecule" step is saved into a pickle.bz2 in case failure in later steps.

create all permutations of cores with different number of branch pts limited by predefined number of allowed R-group carbon atom
create all permutations of different number of all R-groups, limited by predefined number of allowed R-group carbon atom
Combine core scaffolds (with designated branch points) and R-groups (with branch points) into one molecule
remove molecules with degree of unsaturation not fitting criteria
remove duplicated SMILES (strings are tautomerized and canonicalized)
generate all possible stereoisomers (E/Z, diastereomer) and remove meso-isomers (SMILES are tautomerized and canonicalized)
report results

####################################################################

mungpeter/LigandMMPA

AnalogGenerator