difference of reporting experiment results

Question

difference of reporting experiment results

Opened this issue 2 years ago · 27 comments

There is difference between reporting experiment results of paper and results from github code.
Is the paper results from TBind v1.0.1, the github from TBind v0.5.0 ???

Answer 1 · 2022-06-24T07:06:52.000Z

the result from the GitHub model should be about the same as the one reported in preprint. we only slightly updated the coordinates generation part of the code. The v1.0.1 is not released yet.

Answer 2 · 2022-06-24T07:18:03.000Z

@luwei0917
The model outputs the same affinity value for different structures of the same ligand: RDKit structure, PDBBind structure and predict structures of TankBind.
Is this normal?

Answer 3 · 2022-06-24T15:36:06.000Z

Yes. In my opinion, that's the beauty of TankBind. You don't need prior knowledge of the complex structure to predict the affinity.

Answer 4 · 2022-06-25T01:54:50.000Z

@luwei0917
That sounds awesome.
I am curious about how tankbind achieve that the predicted affinity is conformation-agnostic, that is, the input of coordinates of ligands do not influence the outputted affinity;
If so, why can tankbind update these coordinates of ligands to improve the docked pose of ligands.

Answer 5 · 2022-06-25T15:13:20.000Z

The coordinates of the ligand is determined based on its interaction with the protein and constraint imposed by chemical bonds and excluded volume effects.

Answer 6 · 2022-06-27T06:50:50.000Z

@luwei0917

predict coord
D^c_{j,k} in Equation (5) is the group truth distance from PDBBind?
If so, the label is leaked?
predict affinity
D_{ij} in L_distance around Equation (4) is the group truth distance from PDBBind?
If so, no matter what conformations input to the model, the best conformation can always be obtained, and the predict
affinity is always based on the best conformation ?

Answer 7 · 2022-06-27T07:57:15.000Z

It is based on the RDKit generated conformation, not PDBbind when testing or inferencing . It is based on PDBbind conformation during training.
A side note, even during training, under the self-dock setting, we also impose a LAS mask to the D^c_{j,k}.
2.
This is computing loss for training the model. We want our predicted D_{ij} to be as close to the ground truth as possible.
Affinity prediction is always based on the predicted interaction embedding.

The conformation of the ligand is predicted, not given.

Answer 8 · 2022-06-27T12:42:54.000Z

@luwei0917

"It is based on the RDKit generated conformation when testing or inferencing." This will make the conformation closer to RDKit, i.e., farther to PDB conformation？

Answer 9 · 2022-06-27T15:29:57.000Z

The LAS mask exists in self dock. therefore, only the basic constraint is imposed. The general conformation is determined by the interaction.

Answer 10 · 2022-06-28T01:11:55.000Z

@luwei0917
Is TankBind SE(3)-equivariant？

Answer 11 · 2022-06-28T01:36:01.000Z

yes. I think so. relative distance is SE(3)-equivariant.

Answer 12 · 2022-06-28T04:03:35.000Z

The LAS mask exists in self dock. therefore, only the basic constraint is imposed. The general conformation is determined by the interaction.

LAS includes rings, and the N-O-C ring in the predict example of your code, however, the N-O-C ring is not rigid, which should not in the LAS mask ?

Answer 13 · 2022-06-28T04:23:56.000Z

protein's GVP embedding and ligand's torchdrug feature, is these features invariant or equivariant?

Answer 14 · 2022-06-28T07:33:48.000Z

I used "Chem.GetSymmSSSR" function in RDKit to get the ring. Could be that some ring should be rigid but considered flexible, or other way around. If you have better solution, please let me know.

Answer 15 · 2022-06-28T07:39:06.000Z

I used "Chem.GetSymmSSSR" function in RDKit to get the ring. Could be that some ring should be rigid but considered flexible, or other way around. If you have better solution, please let me know.

Chem.GetSymmSSSR is used to obtain symmetry smallest ring, not necessarily rigid. So we could choose from "Chem.GetSymmSSSR" rings that have only one RDKit conformation?

Answer 16 · 2022-06-28T14:04:03.000Z

Right, but how could I know if the ring only have single RDKit conformation?

Answer 17 · 2022-06-28T14:19:19.000Z

from rdkit import Chem
from rdkit.Chem import AllChem
import numpy as np

def generate_conformation(mol):
    mol = Chem.AddHs(mol)
    ps = AllChem.ETKDGv2()
    try:
        rid = AllChem.EmbedMolecule(mol, ps)
        AllChem.MMFFOptimizeMolecule(mol, maxIters=500, confId=0)
    except:
        mol.Compute2DCoords()
    mol = Chem.RemoveHs(mol)
    return mol

def generate_coords(smiles = 'c1ccccc1'):
    mol_from_rdkit = Chem.MolFromSmiles(smiles)
    mol_from_rdkit = generate_conformation(mol_from_rdkit)
    coords = mol_from_rdkit.GetConformer().GetPositions()
    return coords
    
def rigid_transform_3d(A, B):
    """ Find the rotation  and translation matrix of point set A and B
    :param A: A 3d "source" vector
    :param B: A 3d "destination" vector
    :return R: A rotation matrix (3x3)
    :return t: A translation vector
    Align A to B: np.matmul(A, R.T) + t.reshape([1, 3])
    """
    assert len(A) == len(B), "Invalid length"  # Length of two vector should be the same
    N = A.shape[0]  # total points
    centroid_A = np.mean(A, axis=0)
    centroid_B = np.mean(B, axis=0)
    # centre the points
    AA = A - np.tile(centroid_A, (N, 1))
    BB = B - np.tile(centroid_B, (N, 1))
    H = np.matmul(np.transpose(AA), BB)
    U, S, Vt = np.linalg.svd(H)
    R = np.matmul(Vt.T, U.T)
    # special reflection case
    if np.linalg.det(R) < 0:
        print("Reflection detected")
        Vt[2, :] *= -1
        R = np.matmul(Vt.T, U.T)
    t = -np.matmul(R, centroid_A) + centroid_B
    return R, t
    
def get_err(smiles = 'c1ccccc1'):
    A, B = generate_coords(smiles), generate_coords(smiles)
    R, t = rigid_transform_3d(A, B)
    err = B - np.matmul(A,R.T) - t.reshape([1, 3])
    return np.mean(np.abs(err))
    
def rigid_ring(smiles = 'c1ccccc1'):
    # get_err('c1ccccc1')  # 1e-6  
    # get_err('C1CNCOC1')  # 0.158  
    if get_err(smiles) > 1e-3:
        return False
    else:
        return True

Answer 18 · 2022-06-29T02:56:00.000Z

the result from the GitHub model should be about the same as the one reported in preprint. we only slightly updated the coordinates generation part of the code. The v1.0.1 is not released yet.

The checkpoint and predict code cannot reproduce the results in the paper, especially the 'mean' column in Table 1. Have you test before? Any suggestions?
@luwei0917

Answer 19 · 2022-06-29T04:20:07.000Z

Thanks for the "rigid_ring" function.
The number should be about the same, we tested it before.
I guess it could be atom index misalignment. the torchdrug input is the smiles. so the index might be different from the input sdf.
I used following script to re-order atoms.
sm = Chem.MolToSmiles(mol)
m_order = list(mol.GetPropsAsDict(includePrivate=True, includeComputed=True)['_smilesAtomOutputOrder'])
mol = Chem.RenumberAtoms(mol, m_order)

If this is not the problem, can you send me your folder/scripts to wei.lu@galixir.com and I will take a look.

Answer 20 · 2022-06-30T08:52:22.000Z

I have solved the re-order problem, but still can't reproduce the result. The email with the code and the reproduced results has been sent to your email address.

Hope to get your suggestion.

Answer 21 · 2022-07-02T07:35:51.000Z

I have solved the re-order problem, but still can't reproduce the result. The email with the code and the reproduced results has been sent to your email address.

Hope to get your suggestion.

Thanks for your suggestion email. There are still some diffs in the new reproduced results and the details have been sent to your email.

Hope to get your reply.

Answer 22 · 2022-09-17T07:44:24.000Z

I am hitting analogous issue.
There is a gap between the results we reproduced and those reported in the paper, especially for the metric RMSD.
@Anfankus @yufengwhy Could you reproduce the results later?

Answer 23 · 2022-09-19T16:05:30.000Z

@Yuki0613 I uploaded the notebook for evaluating the test set.

Answer 24 · 2022-09-21T10:39:06.000Z

@Yuki0613 I uploaded the notebook for evaluating the test set.
I think the main difference is the data process, because if i use processed data of mine, there is still the performance diff. @Yuki0613
Can the code to process pdbbind to the test data be provided? @luwei0917

I use the data process code in this code base, but got 3265 pocket samples in the test set, however your notebook shows 2879 pocket samples.
what's the process of 'renumber_atom_index_same_as_smiles' in the notebook?

Answer 25 · 2022-09-22T01:46:07.000Z

I used top10 p2rank pockets. you probably used all pockets.
"renumber_atom_index_same_as_smiles" folder is created by cell10 of
https://github.com/luwei0917/TankBind/blob/ed4c87f47b8268e7ee4f4dc824656fdfe6883593/examples/construction_PDBbind_training_and_test_dataset.ipynb

Answer 26 · 2022-09-28T13:01:20.000Z

Can the training code be updated?
For example in the data.py, many names of pt files are mismatched with those in contruction_dataset notebook.
Moreover, there are two compound_dict in new_dataset.compound_dict and all_pocket_test.compound_dict, but these is only one compound_dict in the contruction_dataset notebook?
@luwei0917

Answer 27 · 2022-11-16T10:24:27.000Z

Can the training code be updated? For example in the data.py, many names of pt files are mismatched with those in contruction_dataset notebook. Moreover, there are two compound_dict in new_dataset.compound_dict and all_pocket_test.compound_dict, but these is only one compound_dict in the contruction_dataset notebook? @luwei0917

I think new_dataset.compound_dict is built by construction_PDBbind_training_and_test_dataset.ipynb notebook, which is named compound_torchdrug_features.pt. The other two dicts (should be the same), all_pocket_test.compound_dict and all_pocket_test.compound_dict, is built by testset_evaluation_cleaned.ipynb notebook, which is named pdbbind_test_compound_dict_based_on_rdkit.pt.