deepmodeling/Uni-Mol

problems running demo files

Closed this issue · 5 comments

Hi,
The demo.sh and variations have this argument:
--model-dir checkpoint_best.pt

This file does not exist. If I replace it by:
--model-dir ../weights/unimol_docking_v2_240517.pt
I get errors that sem to be linked to the fact that it tries to open files with a name made of each letter of: ligand_predict

(unimol) christian@christian-linux02:/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface$ ./demo1.sh
Namespace(model_dir='../weights/unimol_docking_v2_240517.pt', input_protein='../example_data/protein.pdb', input_ligand='../example_data/ligand.sdf', input_batch_file='input_batch.csv', input_docking_grid='../example_data/docking_grid.json', output_ligand_name='ligand_predict', output_ligand_dir='predict_sdf', mode='single', batch_size=4, nthreads=8, conf_size=10, cluster=True, use_current_ligand_conf=False, steric_clash_fix=True)
Start preprocessing data...
Number of ligands: 1
1it [00:01, 1.37s/it]
Total num: 1, Success: 1, Failed: 0
Done!
2024-06-24 00:13:10 | INFO | unimol.inference | loading model(s) from ../weights/unimol_docking_v2_240517.pt
2024-06-24 00:13:10 | INFO | unimol.tasks.docking_pose_v2 | ligand dictionary: 30 types
2024-06-24 00:13:10 | INFO | unimol.tasks.docking_pose_v2 | pocket dictionary: 9 types
2024-06-24 00:13:11 | INFO | unimol.inference | Namespace(no_progress_bar=False, log_interval=50, log_format='simple', tensorboard_logdir='', wandb_project='', wandb_name='', seed=1, cpu=False, fp16=True, bf16=False, bf16_sr=False, allreduce_fp32_grad=False, fp16_no_flatten_grads=False, fp16_init_scale=4, fp16_scale_window=256, fp16_scale_tolerance=0.0, min_loss_scale=0.0001, threshold_loss_scale=None, user_dir='/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol', empty_cache_freq=0, all_gather_list_size=16384, suppress_crashes=False, profile=False, ema_decay=-1.0, validate_with_ema=False, loss='docking_pose_v2', optimizer='adam', lr_scheduler='fixed', task='docking_pose_v2', num_workers=8, skip_invalid_size_inputs_valid_test=False, batch_size=4, required_batch_size_multiple=1, data_buffer_size=10, train_subset='train', valid_subset='ligand_predict', validate_interval=1, validate_interval_updates=0, validate_after_updates=0, fixed_validation_seed=None, disable_validation=False, batch_size_valid=4, max_valid_steps=None, curriculum=0, distributed_world_size=1, distributed_rank=0, distributed_backend='nccl', distributed_init_method=None, distributed_port=-1, device_id=0, distributed_no_spawn=False, ddp_backend='c10d', bucket_cap_mb=25, fix_batches_to_gpus=False, find_unused_parameters=False, fast_stat_sync=False, broadcast_buffers=False, nprocs_per_node=1, path='../weights/unimol_docking_v2_240517.pt', quiet=False, model_overrides='{}', results_path='/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface/predict_sdf', arch='docking_pose_v2', recycling=4, data='/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface/predict_sdf', finetune_mol_model=None, finetune_pocket_model=None, conf_size=10, dist_threshold=8.0, max_pocket_atoms=256, adam_betas='(0.9, 0.999)', adam_eps=1e-08, weight_decay=0.0, force_anneal=None, lr_shrink=0.1, warmup_updates=0, no_seed_provided=False, mol=Namespace(encoder_layers=15, encoder_embed_dim=512, encoder_ffn_embed_dim=2048, encoder_attention_heads=64, dropout=0.1, emb_dropout=0.1, attention_dropout=0.1, activation_dropout=0.0, pooler_dropout=0.0, max_seq_len=512, activation_fn='gelu', pooler_activation_fn='tanh', post_ln=False, masked_token_loss=-1.0, masked_coord_loss=-1.0, masked_dist_loss=-1.0, x_norm_loss=-1.0, delta_pair_repr_norm_loss=-1.0), pocket=Namespace(encoder_layers=15, encoder_embed_dim=512, encoder_ffn_embed_dim=2048, encoder_attention_heads=64, dropout=0.1, emb_dropout=0.1, attention_dropout=0.1, activation_dropout=0.0, pooler_dropout=0.0, max_seq_len=512, activation_fn='gelu', pooler_activation_fn='tanh', post_ln=False, masked_token_loss=-1.0, masked_coord_loss=-1.0, masked_dist_loss=-1.0, x_norm_loss=-1.0, delta_pair_repr_norm_loss=-1.0), encoder_layers=15, encoder_embed_dim=512, encoder_ffn_embed_dim=2048, encoder_attention_heads=64, dropout=0.1, emb_dropout=0.1, attention_dropout=0.1, activation_dropout=0.0, pooler_dropout=0.0, max_seq_len=512, activation_fn='gelu', pooler_activation_fn='tanh', post_ln=False, masked_token_loss=-1.0, masked_coord_loss=-1.0, masked_dist_loss=-1.0, x_norm_loss=-1.0, delta_pair_repr_norm_loss=-1.0, distributed_num_procs=1)
2024-06-24 00:13:11 | INFO | unicore.tasks.unicore_task | get EpochBatchIterator for epoch 1
2024-06-24 00:13:14 | INFO | unimol.inference | Done inference!
Start converting model predictions into sdf files...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2032.12it/s]
Done!
0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file d
Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file t
Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file e
Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file p
Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file r
Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file c
Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file i
Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file _
5it [00:01, 3.34it/s] Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file s
Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file d
[00:13:19] Counts line too short: '' on line4
Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 514, in single_refine
Chem.SanitizeMol(in_lig, sanitizeOps=Chem.SanitizeFlags.SANITIZE_ALL, catchErrors=True)
Boost.Python.ArgumentError: Python argument types in
rdkit.Chem.rdmolops.SanitizeMol(NoneType)
did not match C++ signature:
SanitizeMol(RDKit::ROMol {lvalue} mol, unsigned long sanitizeOps=rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_ALL, bool catchErrors=False)
Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file f
Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file l
Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file a
Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file i
9it [00:03, 2.97it/s]Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file g
15it [00:03, 5.70it/s]Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file _
Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file p
Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file n
Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file d
Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file r
Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file d
Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file i
Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file e
22it [00:05, 5.32it/s]Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file c
Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in
single_refine(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine
in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False)
OSError: Bad input file t
26it [00:06, 4.02it/s]
output ligand path:
predict_sdf/ligand_predict.sdf
total time: 15.468194723129272 sec.
All processes done!

This bug has been fixed through issue #230. Please pull the latest code.

Now I can run this w/o problem:

python demo.py --mode single --conf-size 10 --cluster
--input-protein ../example_data/protein.pdb
--input-ligand ../example_data/ligand.sdf
--input-docking-grid ../example_data/docking_grid.json
--output-ligand-name ligand_predict
--output-ligand-dir predict_sdf
--steric-clash-fix
--model-dir ../weights/unimol_docking_v2_240517.pt

However, I tried to store my files elsewhere and I started having problems... I cannot run with full path, I need to stay with relative path. I was not able to run the script with my files so I moved ../example_data/ligand.sdf from the demo.

This works:
--input-ligand ../Targets/ligand.sdf \

but this does not:
--input-ligand /media/christian/VS1/VS/VS/VS_Uni-Mol/unimol_docking_v2/Targets/ligand.sdf \

it gave me:

(unimol) christian@christian-linux02:/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface$ ./demo3.sh
Namespace(model_dir='../weights/unimol_docking_v2_240517.pt', input_protein='../example_data/protein.pdb', input_ligand='/media/christian/VS1/VS/VS/VS_Uni-Mol/unimol_docking_v2/Targets/ligand.sdf', input_batch_file='input_batch.csv', input_docking_grid='../example_data/docking_grid.json', output_ligand_name='ligand_predict', output_ligand_dir='predict_sdf', mode='single', batch_size=4, nthreads=8, conf_size=10, cluster=True, use_current_ligand_conf=False, steric_clash_fix=True)
Traceback (most recent call last):
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface/demo.py", line 189, in
main_cli()
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface/demo.py", line 185, in main_cli
main(args)
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface/demo.py", line 24, in main
output_ligand) = clf.predict_sdf(
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface/predictor/unimol_predictor.py", line 76, in predict_sdf
output_pkl, output_lmdb = self.predict(input_protein,
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface/predictor/unimol_predictor.py", line 38, in predict
lmdb_name = self.preprocess(input_protein, input_ligand, input_docking_grid, output_ligand_name, output_ligand_dir)
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface/predictor/unimol_predictor.py", line 27, in preprocess
processed_data = preprocessor.preprocess(input_protein, input_ligand, input_docking_grid, output_ligand_name, output_ligand_dir)
File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface/predictor/processor.py", line 46, in preprocess
supp = Chem.SDMolSupplier(input_ligand)
OSError: File error: Bad input file /media/christian/VS1/VS/VS/VS_Uni-Mol/unimol_docking_v2/Targets/ligand.sdf

same thing with the protein file... The full path

    --input-protein /media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/example_data \

gives a different error such as:

TypeError: cannot unpack non-iterable NoneType object

and generates this file: failed_pocket.txt containing:
/ m e d i a / c h r i s t i a n / V S 1 / V S / V S _ U n i - M o l / u n i m o l _ d o c k i n g _ v 2 / e x a m p l e _ d a t a

This works:
--input-ligand ../Targets/ligand.sdf
but this does not:
--input-ligand /media/christian/VS1/VS/VS/VS_Uni-Mol/unimol_docking_v2/Targets/ligand.sdf \

I used the absolute path of the sdf file from the example data as input-ligand, and failed to reproduce your error.

File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface/predictor/processor.py", line 46, in preprocess
supp = Chem.SDMolSupplier(input_ligand)
OSError: File error: Bad input file /media/christian/VS1/VS/VS/VS_Uni-Mol/unimol_docking_v2/Targets/ligand.sdf

Based on the error message, the sdf file path is complete, and Chem.SDMolSupplier can accept absolute paths, but the file cannot be read from this path. Are you sure this path is correct? Or could you provide the sdf file you are using?

same thing with the protein file... The full path

    --input-protein /media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/example_data \

--input-protein needs to be a pdb file, not a directory

If I run this it works

python demo.py --mode single --conf-size 10 --cluster
--input-protein /media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/example_data/protein.pdb
--input-ligand /media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/example_data/ligand.sdf
--input-docking-grid ../example_data/docking_grid.json
--output-ligand-name ligand_predict
--output-ligand-dir predict_sdf
--steric-clash-fix
--model-dir ../weights/unimol_docking_v2_240517.pt

So it seems to be ok with a full path. For the protein.pdb, it should have pointed directly to the file and not to the directory. But I'm not sure what happens when I just changed the ligand from:
--input-ligand ../Targets/ligand.sdf
to
--input-ligand /media/christian/VS1/VS/VS/VS_Uni-Mol/unimol_docking_v2/Targets/ligand.sdf \

As long as it works with the demo files, it should work with mine. I'll figure it out, it is probably just a detail. I ran 134 ligands vs 1 receptor using relative path and it worked well, at around 3.5sec/pose. The poses are very similar to what I get with DiffBindFR which is 20-40x slower. Thank you very much for your help and this very useful tool.