/LigandMMPA

Prepare and fragmentate ligands for matched-molecular pair analysis

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

AnalogGenerator

Generate a combinatorial library of diverse analogs on core scaffolds by decorating with a library of R-groups

====================================================================================

>  0_canonical_smiles_convert.py
       [input: original SMILES file]
       [output: RDKit canonical SMILES file]

e.g)  > x.py original.smi rdkit_canonical.smi

Goal: A quick way to convert any SMILES strings into RDKit-specific canonical SMILES format.

================================================

>  1_mmpdb_frag_gen.csh 
      [list of SMILES files]

Goal: Parse chemical library (SMILES) with mmpDB to generate fragments in JSON format (formatted list of lists)

================================================

>   2_parse_mmpdb_frag.py
       -list [list of SMILES files for mmpDB fragmentation]
       -size [heavy atom count of fragment to be saved]
       -out  [mmpDB output prefix]
       -regex [Optional: Regular Expression of atomtypes to be excluded] 
                         (def: "c|n|s|N|S|O|P|F|Cl|Br|I|Se|Te|B")

e.g)  > x.py -list smi.list -size 10 -out mmpdb_frag \
             -regex "c|n|s"      (don't collect any aromatic fragments)

Goal: Parse the JSON results from mmpDB's fragmentation step originally for Matched Molecular Pair Analysis. Here the results are collected to generate a list of SMILES strings with attachment point designated as [*] and output as CSV file.

Note: [*] are placed at the beginning of the string by mmpDB, may create problems for Chem.CanonSmiles() function when combining R-groups to core molecule. Place the [*] flag behind one atom:

      [*]CCCC          -->  C([*])CCC
      [*]C1CC1         -->  C1([*])CC1
      [*][C@H]12CC1C2  -->  [C@H]12([*])CC1C2

================================================

>   3_weld_r_groups.py
       -templ [Core Scaffold SMILES with attachment points marked by "x" (eg: CxxCx=C)]
       -r     [CSV file of R groups with attachment point marked by "[ * ]" generated by 2_parse_mmpdb_frag.py]
       -out   [Output prefix]
       -raw        [Optional: Read in pre-generated analog intermediate file (pickle file)]
       -unsat_min  [Optional: Remove molecule with deg unsaturation less than this (def: None)]
       -unsat_max  [Optional: Remove molecule with deg unsaturation larger than this (def: None)]

e.g) >   x.py -templ core_template.smi -r mmpdb_frag.csv -out combinatorial_analogs \
              -raw analog_intermediate.pickle.bz2 \
              -unsat_min 2 -unsat_max 5

Goal: Create combinatorial analog library of a core scaffold using a fragment library. While working on the generation, intermediates from the "Combine Core/R-group to generate molecule" step is saved into a pickle.bz2 in case failure in later steps.

  1. create all permutations of cores with different number of branch pts limited by predefined number of allowed R-group carbon atom
  2. create all permutations of different number of all R-groups, limited by predefined number of allowed R-group carbon atom
  3. Combine core scaffolds (with designated branch points) and R-groups (with branch points) into one molecule
  4. remove molecules with degree of unsaturation not fitting criteria
  5. remove duplicated SMILES (strings are tautomerized and canonicalized)
  6. generate all possible stereoisomers (E/Z, diastereomer) and remove meso-isomers (SMILES are tautomerized and canonicalized)
  7. report results

####################################################################