luwei0917/TankBind

The exact number of PDBBind training set

Closed this issue · 2 comments

Hi Wei!
I'm try to use the construction_PDBbind_training_and_test_dataset.ipynb to process the PDBBind dataset manually. But I find several inconsistency which makes me confused.

  1. The size of training set reported in the newest verion of your paper is 17,787. But the size of training set outputed in construction_PDBbind_training_and_test_dataset.ipynb 17,786.
  2. I run the jupyter notebook locally and the number of ligand file that can be readable by RDKit is 19,128(then the size of final processed traning set is 17,795). But the same cell output in your raw jupyter notebook is 19,119. Is this cause by RDKit version(the version I used is 2022.03.5 installed through pip) or something else?

could be. I was using RDkit 2021.03.4. Since the difference is small, I guess either way is ok.

Thanks for your quick reply!