mittinatten/freesasa

Classifier for hydrogen atoms might be not working

Closed this issue · 8 comments

In the following code
classifier = freesasa.Classifier("t.config")
structure = freesasa.Structure("t.pdb",classifier, options={'hydrogen' : True,'hetatm' : True})
result = freesasa.calc(structure,freesasa.Parameters({'algorithm' : freesasa.LeeRichards,'n-slices' : 200,'n-threads':6}))
with these files
Archive.zip
I'm getting
FreeSASA: warning: atom ' DC H5'' unknown, guessing element is ' H', and radius 1.100 A
FreeSASA: warning: atom ' DC H5''' unknown, guessing element is ' H', and radius 1.100 A
FreeSASA: warning: atom ' DC H4'' unknown, guessing element is ' H', and radius 1.100 A
FreeSASA: warning: atom ' DC H3'' unknown, guessing element is ' H', and radius 1.100 A
FreeSASA: warning: atom ' DC H2'' unknown, guessing element is ' H', and radius 1.100 A
FreeSASA: warning: atom ' DC H2''' unknown, guessing element is ' H', and radius 1.100 A
FreeSASA: warning: atom ' DC H1'' unknown, guessing element is ' H', and radius 1.100 A
FreeSASA: warning: atom ' DC H41' unknown, guessing element is ' H', and radius 1.100 A
FreeSASA: warning: atom ' DC H42' unknown, guessing element is ' H', and radius 1.100 A
FreeSASA: warning: atom ' DC H5 ' unknown, guessing element is ' H', and radius 1.100 A
FreeSASA: warning: atom ' DC H6 ' unknown, guessing element is ' H', and radius 1.100 A

So the classifier fails to assign radii provided in config file.

Hi,
Thanks for the bug report. I never wrote a test for this functionality, so I must have broken it somewhere along the way. I'm pretty sure it used to work, so I should have a fix shortly.

Simon

I had a look and it turns out this wasn't an actual error: in the python bindings the structure is first initialized with the default classifier and then the radii are reassigned using the classifier provided by the user, if there is one. The first step produces the warnings you saw, but then correct radii where being assigned.

I have rewritten this to avoid the double assignment in the general case, and suppress the warnings when necessary. See the dev branch.

Thank you! Yes, I can confirm that the assignment actually was working because I was able to reproduce the Naccesss results for the hydrogen atom SASA in DNA along nucleosome (see plot).
h-sasa_compar

There is however, some jitter upto 1 sqA in atomic SASA values between Naccess and Freesasa.
Archive.zip
I attach two file for comparison that were run with very high precision (the parameters are in the header). E.g. for H4' 1.422 vs 0.96
Do you think these small differences are normal manifestation of variations in algorithm implementation (Lee Richards in this case) or there still may be some fix for it?

I will have detailed look later today, but the quick answer is that I found that I got identical results to Naccess at lower resolutions when using the same radii and same resolution, but that there were differences when I went to higher resolutions (I think z < 0.01). Since the results from the two algorithms in Freesasa approach each other asymptotically the higher the resolution I assumed there was either some difference in the implementation that only shows at high resolution (but as far as I can tell they should be identical), or maybe a round-off/precision error in Naccess. My Fortran skills are pretty poor, so I never managed to figure this out by looking at the code.

One simple check would be for you to do the same analysis at a lower resolution. 200 slices, as in your example, would correspond to z=0.005 in Naccess, do you still get differences if you use 100 slices and z=0.01?

While I'm doing more testing - I found out that I used rounded values of atomic radii, that I provided to Naccess. Some radii in CHARMM ff may be as precise as e.g. 0.2245 A.
I truncated them to e.g. 0.22, while used full precision for freesasa - this should be one reason for the discrepancies.

Here is the summary of my testing of freesasa vs Naccess:

  1. Naccess does not handle atomic radii with precision of more than 0.01A, while Freesasa does, so any comparison has to be done with atomic radii rounded off to %.2f
  2. The atomic SASA values calculated with Naccess at z=0.005 match closely values calculated by Freesasa with n=200. The differences are in the 0.01sqA range which to me is a formidable reproducibility.
  3. Once in Naccess precision is increased z=0.0005, 0.00005 - I do not see convergence in Naccess calculation. While Freesasa results on the other hand indeed converge.

I attach the analysis files.
Archive.zip

  1. Good to know! FreeSASA radii are read and stored with double precision.
  2. That's similar to what I found when I did my tests, great to have an independent confirmation.
  3. Seems like you see the same problems I saw with Naccess last year. In your files, when you go from z = 0.0005 to 0.00005 the difference in area is larger than the difference between 0.005 and 0.0005. I would expect the opposite. This is consistent with there being a small precision error for each slice which adds up when you have many slices.

So we can consider this issue closed then?

Definitely, and thank you for your effort!