chemosim-lab/ProLIF

Key errors in vdwradii

Closed this issue · 2 comments

Hi development team

I am using ProLIF to investigate the interaction between my ligand (vitamin B12) and protein.
I faced an issue when using ProLIF. This occurred when I tried to generate the fingerprint between the ligand and protein.
Screenshot 2024-10-23 at 14 27 37

I found it is because the vdwradii of the cobalt ion (which is in the center of my ligand vitamin B12) is not defined. Then, I found the default dictionary of the vdwradii is imported from MDAnalysis.

Thus, I created a new dictionary, inherited from MDAnalysis's vdwradii, with the vdw radius of the other elements from rdkit (see below). Finally, I sorted it out.

import prolif as plf
from rdkit.Chem import GetPeriodicTable
from MDAnalysis.topology.tables import vdwradii

ptb = GetPeriodicTable()
elements = {}
i = 0
while True:
    i += 1
    try:   
        elements[i] = ptb.GetElementSymbol(i).upper()
    except:
        break

vdwradii_rdkit = {elements[each_atnum]: ptb.GetRvdw(each_atnum) for each_atnum in elements}

vdwradii_new = {}
for each_element in vdwradii_rdkit.keys():
    if each_element not in vdwradii.keys():
        vdwradii_new[each_element.capitalize()] = vdwradii_rdkit[each_element]
    else:
        vdwradii_new[each_element.capitalize()] = vdwradii[each_element]
# fingerprint calculation
fp = plf.Fingerprint(
    parameters={
    "VdWContact": {
        "tolerance": 0.0,
        "vdwradii": vdwradii_new
}})

fp.run(u.trajectory, ligand, protein)

Screenshot 2024-10-23 at 15 17 28

I am thinking it is possible to add the vdwradii of most atoms in the chemical periodic table in the default setting so that people (especially for some investigating hemeglobins or other proteins with a ligand having transition metal) won't get some errors and need to check the source codes (as I did).

Hi @yuyuan871111,

It's indeed possible to include more vdw radii to the defaults, my only worry is starting to mix different sources for it.
I use the values from MDAnalysis which come from Bondi 1964, while RDKit uses data originally defined in the Blue Obelisk Data Repository which I think mostly come from Batsanov 2001. There's also this more recent paper that proposes yet another set of values for even more elements.
Here's a plot with the difference in absolute values between MDAnalysis and RDKit:
image

I could add the missing values by taking them from RDKit like you did, but then I'd be mixing things that can be quite inconsistent. I'd rather leave it to users to make a deliberate decision on that.

However I think there's definitely some improvements to be made in terms of user-friendliness, so I might just add a new parameter to switch between the vdw radii as available in mdanalysis, rdkit and the other paper's values, and add some more documentation on that, so that people don't have to manually add new radii.

Best,
Cédric

Thank you!!