deepmodeling/Uni-Mol

unimpl_tools issue about “molecule property prediction”

Opened this issue · 7 comments

hello:
I want to know this code in unimol_tools molecule property prediction `from unimol_tools import MolTrain, MolPredict
clf = MolTrain(task='classification',
data_type='molecule',
epochs=10,
batch_size=16,
metrics='auc',
)
pred = clf.fit(data = data)

currently support data with smiles based csv/txt file, and

custom dict of {'atoms':[['C','C],['C','H','O']], 'coordinates':[coordinates_1,coordinates_2]}

clf = MolPredict(load_model='../exp')
res = clf.predict(data = data)`.
This code is a api to use unimol that confuse me.
The thoer question is about one function "molecule property prediction" which why have many version code to do, however, all those not description to different.

MolTrain is used for training models with different types of data, including SMILES-based and 3D coordinates based. For example, in bioactivity prediction, you can use docking or FEP conformations as input, which is more suitable than SMILES based.
MolPredict provides prediction services using models trained with MolTrain. This means you can train your model with MolTrain and then use MolPredict for inference services.

This is a error when I use this code to predict 'mol_test.csv'. The following is detail information. So, how can I do about this.
图片
python shi.py 2024-06-27 06:08:16.615493: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-06-27 06:08:16.662706: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. Traceback (most recent call last): File "shi.py", line 22, in <module> clf = MolPredict(load_model='./weights') File "/sunjs/Softwares/Uni-Mol-main/unimol_tools/unimol_tools/predict.py", line 34, in __init__ self.config = YamlHandler(config_path).read_yaml() File "/sunjs/Softwares/Uni-Mol-main/unimol_tools/unimol_tools/utils/config_handler.py", line 24, in __init__ raise FileExistsError(OSError) FileExistsError: <class 'OSError'> python shi.py 2024-06-27 06:08:16.615493: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-06-27 06:08:16.662706: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. Traceback (most recent call last): File "shi.py", line 22, in <module> clf = MolPredict(load_model='./weights') File "/sunjs/Softwares/Uni-Mol-main/unimol_tools/unimol_tools/predict.py", line 34, in __init__ self.config = YamlHandler(config_path).read_yaml() File "/sunjs/Softwares/Uni-Mol-main/unimol_tools/unimol_tools/utils/config_handler.py", line 24, in __init__ raise FileExistsError(OSError) FileExistsError: <class 'OSError'>

you should load model from your save_path.
MolPredict(load_model='./exp')

yes,the "./weights" is my models directory.
图片
.
Would it be better to use the "./exp" directory based on your advice?
Or is there any other advice that I haven't considered?

Use './weights' for the initial pretrained weights, which are the default weights provided by UniMol. For your fine-tuned model weights, use './exp'. If you only need to utilize the representation capabilities of UniMol, you can simply use UniMolRepr:

from unimol_tools import UniMolRepr
# single smiles unimol representation
clf = UniMolRepr(data_type='molecule', remove_hs=False)
smiles = 'c1ccc(cc1)C2=NCC(=O)Nc3c2cc(cc3)[N+](=O)[O]'
smiles_list = [smiles]
unimol_repr = clf.get_repr(smiles_list, return_atomic_reprs=True)

if you want to train model with your own dataset, the best practice is:

  1. fit your own data with MolTrain;
  2. predict with your training model by use MolPredict load from your saving path, such as './exp' fold here.

yes, I use that code
`
from unimol_tools import UniMolRepr

single smiles unimol representation

clf = UniMolRepr(data_type='molecule', remove_hs=False)
smiles = 'c1ccc(cc1)C2=NCC(=O)Nc3c2cc(cc3)N+[O]'
smiles_list = [smiles]
unimol_repr = clf.get_repr(smiles_list, return_atomic_reprs=True)
`
there is a error
图片
,
this right?how

It seems the smiles is illegal for generate conformations