czodrowskilab/VSFlow

some errors occured while running vsflow preparedb

Closed this issue · 7 comments

hi,

Snipaste_2022-06-23_22-48-42

Traceback (most recent call last):
File "vsflow", line 6, in
run.main()
File "C:\Users.....\VSFlow-master\vslib\run.py", line 1560, in main
args.func(args)
File "C:\Users.....\VSFlow-master\vslib\run.py", line 1380, in prep_db
prepare.gen_confs(mols, args.nconfs, seed, args.rms_thresh, "mol", nthreads)
File "C:\Users....\VSFlow-master\vslib\prepare.py", line 207, in gen_confs
Chem.EmbedMultipleConfs(mol_H, numConfs=nconfs, params=params)
RuntimeError: Invariant Violation
expected match not found
Violation occurred on line 1062 in file Code\GraphMol\DistGeomHelpers\Embedder.cpp
Failed Expression: strippedMatch.size() == 1
RDKIT: 2022.03.2
BOOST: 1_74

attached please find and see the errors while I'm running vsflow preparedb command line.
could you please help check what problems they are? and how to fix it. thanks

We are looking into that..

Not sure if we ever tested VSFlow with a Windows machine (but it could be the case).

Could you specify how your input looked like: what kind of filetype (sdf, smiles within text file, etc.) was it and with which molecules did you want to prepare a database?
Thanks

Could you specify how your input looked like: what kind of filetype (sdf, smiles within text file, etc.) was it and with which molecules did you want to prepare a database? Thanks

thanks for your reply.
I tried to use the below molecules in sdf file for preparing the database.
Components-pub.sdf.gz
However, some errors I mentioned above can cause the scripts to cease.
could you please hlep check what problems for those problems?

many thanks

These are compounds from the PDB database, right? The problem might be that some compounds are metal complexes (e.g. heme, HEM ligand of the pdb) which may not be processed properly by the underlying RDKit. We may add an option to skip these compounds in the next version. You could try to filter out all metal containing ligands first and then re-run the preparedb step.

thanks for quick feedback.

some compounds containing metal complexes might cause the errors. I will try again after removing those molecules.
many thanks for your help.

best,

Can you let us know once you figured it out?
We will then close this issue.

Can you let us know once you figured it out? We will then close this issue.

yep, I tried to filter the molecules by using the LargestFragmentChooser method, RDKit or molvs.

here are some codes for your reference

from rdkit import Chem
from rdkit.Chem.MolStandardize import rdMolStandardize

largest_Fragment = rdMolStandardize.LargestFragmentChooser()
df['largest_mol']=[largest_Fragment.choose(m) for m in tqdm(df.mol)]
df['largest_smiles']=[Chem.MolToSmiles(m) for m in tqdm(df.largest_mol)]
df1[['largest_smiles', 'ID']].to_csv("./data/pdb_ligs001.smi",
header=None, index= False, sep=' '
)

then run the vsflow preparedb script
vsflow preparedb -i ./data/pdb_ligs001.smi -o ./data/PDB_ligands3D -c --nconfs 10 --rms_thresh 0.3

PDB_ligands3D.vsdb was generated successfully.

done!
many thanks for your helps.

best