mittinatten/freesasa

Wrong SASA after setting different atom radius

fabiotrovato opened this issue · 17 comments

Hi,

I was testing the attached code, which takes in input a pdb file (one Calpha atom and nothing else) and sets its radius to 3.8. I get the wrong sasa value, corresponding to a smaller Calpha radius. Here it is the output from the script.

output:
CA radius 3.8
probe radius 1.4
1
CA
135.194041618

I followed the instructions on https://freesasa.github.io/python/intro.html#basic-calculations. Can you please explain why such a simple case is not working as I would expect (the sasa should be 339.8)? I have attached everything so you can try to reproduce what I see.

Thank you,
Fabio

freesasa_test.py.gz
test.pdb.gz

A workaround that I have found is to pass a list of radii to freesasa.Structure. In this case I recover the correct sasa. Please see below:

cg_struc = freesasa.Structure("test.pdb")
rad_list = [3.8 for i in range(cg_struc.nAtoms())]
cg_struc.setRadii(rad_list)
cg_result = freesasa.calc(cg_struc)
print cg_result.atomArea(idx)

output:
339.794661412

Along the same lines of the atomic radii ...

I have issues also with freesasa.calcCoord(). In this case I am running a script on a one-residue pdb structure and ~ 350 residue pdb structure (they contain all atoms, not only Calphas). Since I want to use only Calphas, I input Calphas coordinates and radii as per freesasa.calcCoord() specs.
What I observe is that for the 350 residue protein the sasa in underestimated. For the one-residue case the sasa is correct. How is this possible if I am using exactly the same code on both structures?

To confirm that the calculation on the 350 residue protein is weird, I have modified the pdb file of the 350 residue protein in order to store ONLY Calphas. Then I used the code I've posted in the second message. The sasa of each Calpha now looks better, as it is larger.

Hi Fabio,
I won't be able to test anything myself until earliest tomorrow. Have you verified that your classifier's radius function is actually called?

Simon

Hi Simon,

The relevant part of my code is this (I had already defined cg_classifier similarly to the example in the manual instructions):
structure = freesasa.Structure("test.pdb",cg_classifier)
result = freesasa.calc(structure)
If I print cg_classifier.radius('*',atomName='CA') I get the radius I want, so that seems correct. I thought that freesasa.Structure("test.pdb",cg_classifier) was sufficient to give the right results. If not, what should I do?

Ok, I found the problem. There is a mistake in the documentation, one line is missing in the example , that is necessary to make it use your custom classifier for assigning radii: You have to add the line purePython = true to your classifier, i.e.

class CG_Calpha_Classifier(Classifier):
    purePython = True
    def classify(self, residueName, atomName):
        ...

It's mentioned in the docs for the Classifier class itself, http://freesasa.github.io/python/classes.html#classifier, but I will update the documentation to reflect this.

That is correct. I was going through the Classifier class and saw this. Thanks, now it works.
There are a few minor things you might want to consider in the example at https://freesasa.github.io/python/intro.html. (1) Indentation of the last return of DerivedClassifier. (2) Argument of DerivedClassifier should be freesasa.Classifier or one should add "from freesasa import Classifier" at the beginning.

Cheers,
Fabio

Thanks! I have updated the docs with your suggestions and the purePython correction, should show up shortly.

No problem. I would appreciate if you could advice also on freesasa.calcCoord() problem (third post).
Given a pdb structure, I would like to compute the SASA of my Calpha-only representation (which I obtain with mdtraj) without the need of (i) writing the Calphas to disk and (ii) then loading the file into a Structure class.

Thank you.

If I understand you correctly you are performing calculations on a full PDB file but you only want to use the C-alphas. If you're file contains other atoms these will be included even if you set their radius to 0. The probe radius will be added, and therefore their SASA will be non-zero. So you would have to construct a structure that has only CA as you describe.

A solution could be initiate a structure from the original PDB, filter the structure by looping over the atoms and only add the coordinates and radii of C-alphas to new arrays which are then passed to freesasa.calcCoord().

Yes, that is what I was trying to say.
Using mdtraj I read in the original PDB, then extract only the Calpha coordinates in an array (coord) and format them as required by calcCoord(). Calling freesasa.calcCoord(coord, rad_list), where rad_list is a list of radii that I define, does not work, I get the wrong sasa. I do not know what I am doing wrong (or misunderstanding)

rad_list = [3.8 for i in range(nca)] #nca is the number of Calphas
coord = ca_trace.xyz[0] #from mdtraj. coord is a numpy array of shape (N,3)
coord = coord.flatten() #required by calcCoord()
result = freesasa.calcCoord(coord,rad_list)
for idx in range(nca):
sasa = result.atomArea(idx) #this sasa has the wrong value

Yes, I did check this at the very beginning. Here it is again the output of a quick test:

a = np.array([[0,1,2],[3,4,5]]) #this mock arrau has the same mdtraj format i.e. [[x0,y0,z0],[x1,y1,z1]]
a
array([[0, 1, 2],
[3, 4, 5]])
a.shape
(2, 3)
a.flatten()
array([0, 1, 2, 3, 4, 5])

Let me give you some additional clues.

  1. If I run my code on a two-residue pdb structure (two residues put very far apart), and feed the flattened array into calcCoord(), I get exactly the right sasa for each Calpha, i.e. 339.8.
  2. If I use the same exact code on a true protein (1ndy) I get what I think are smaller sasa. How do I know that they are smaller? I know because I tried the "safe approach" i.e. creating a structure object from a modified pdb, the one with only Calphas, and the sasa values are 2 to 3 times larger, compatible with the fact that the radii that I define seem to be ignored. (Btw, I would like to avoid the "safe approach" because I feel it's slower than passing directly an array of coordinates to calcCoord()).

I do not understand if this has to do with the size of the coordinate arrays or else.
Let me know if you want me to send the script and files to reproduce 1 and 2.

Best,
Fabio

Hi Simon,

so I found the problem. I was using mdtraj to read in and manipulate coordinates, but I did not convert from nm (mdtraj uses nm, in fact) to Angstrom. This explains the discrepancies between the sasa values when feeding to calcCoord() the coordinates obtained from mdtraj and the sasa values when using the "safe approach" of loading directly the pdb in a freesasa-structure object.
So, calcCoord() is perfectly fine, I was mistaken by a conversion factor.

Sorry about that,
Fabio

Ok, great to hear! Been away from computer. I'll close the issue then, just reopen if anything else comes up! And thanks for the input on documentation.

hi,
I have got this error while using prodigy.
Running Prodigy for structure gsk3b2_complexfit_24
ERROR:
[!] Error when running freesasa:
[!] Error: Radius array is <= 0 for the residue: HIS ,atom: O1P

I have O2P and O3P atoms too
how can I fix this problem?

Hi, that error message does not come from FreeSASA itself, and I have no knowledge of Prodigy, so not sure I can help you. HIS does not have any atoms labelled O1P, so my guess is there is probably something non-standard in your structure. (Also, please open a new issue when it's not directly related to the issue at hand, so the other people in this thread don't get notifications).