mittinatten/freesasa

Hydrogens wrongly supported when using Bio.PDB

mtrellet opened this issue · 2 comments

I am opening this issue because I've just bumped into a similar issue that the one exposed here (#17 ) but with what I believe is a slightly different approach using Biopython as the parser to get the structure.
I made some tests with a PDB containing hydrogens and I get the following warnings followed by an AssertionError:

FreeSASA: warning: atom 'TRP  H  ' unknown, guessing element is ' H', and radius 1.100 A
FreeSASA: warning: atom 'TRP  HE1' unknown, guessing element is ' H', and radius 1.100 A
FreeSASA: warning: atom 'LEU  H  ' unknown, guessing element is ' H', and radius 1.100 A
FreeSASA: warning: atom 'GLN  H  ' unknown, guessing element is ' H', and radius 1.100 A
FreeSASA: warning: atom 'TYR  H  ' unknown, guessing element is ' H', and radius 1.100 A
FreeSASA: warning: atom 'TYR  HH ' unknown, guessing element is ' H', and radius 1.100 A
FreeSASA: warning: atom 'GLY  H  ' unknown, guessing element is ' H', and radius 1.100 A
Traceback (most recent call last):
  File "/Users/mtrellet/Scripts/get_accessibility_freesasa.py", line 182, in <module>
    print(get_accessibility(pdb_path, segid, config))
  File "/Users/mtrellet/Scripts/get_accessibility_freesasa.py", line 116, in get_accessibility
    struct = structureFromBioPDB(structure, classifier, )
  File "freesasa.pyx", line 973, in freesasa.structureFromBioPDB
  File "freesasa.pyx", line 519, in freesasa.Structure.setRadiiWithClassifier
  File "freesasa.pyx", line 540, in freesasa.Structure.setRadii
AssertionError```

And here is the piece of code I'm using (python3.6):

```python
p = PDBParser(QUIET=True)
structure = p.get_structure('pdb', path)
classifier = Classifier(config_path) # Here naccess.config
struct = structureFromBioPDB(structure, classifier) # <- Faulty line
result = calc(struct)

My understanding is that, since hydrogens are absent from my classifier, they get some default radii values. However, it seems that those values are wrongly formatted and then trigger this AssertError later on.

As a test, I tried to add skip_unknown argument as a workaround but I cannot get it pass properly. (Putting a dash cannot be done for variable names in python).

  File "/Users/mtrellet/Dropbox/Scripts/get_accessibility_freesasa.py", line 182, in <module>
    print(get_accessibility(pdb_path, segid, config))
  File "/Users/mtrellet/Dropbox/Scripts/get_accessibility_freesasa.py", line 116, in get_accessibility
    struct = structureFromBioPDB(structure, classifier, skip_unknown=True)
  File "freesasa.pyx", line 927, in freesasa.structureFromBioPDB
TypeError: structureFromBioPDB() got an unexpected keyword argument 'skip_unknown'```

The last solution I would have would be to filter out all hydrogens from the structure I get from Biopython, but I suppose that this is not the optimal solution here.

Thanks in advance for your help and work on this project!!!

Hi,
I'm not sure your use case is covered in a satisfactory way yet. But I see you use underscore and not dash in in skip-unknown? Can't test myself right now, but I think ithe call should be structureFromPDB(structure, classifier, { 'skip-unknown': True }).

As mentioned in the documentation, the BioPDB part hasn't been thoroughly tested yet, so there might be some errors here. In particular, the results won't add 0 area to the atoms skipped, they will just be skipped (i.e. the array of results will be shorter than the number of atoms in the input), which could be confusing. I should maybe add some mapping back to the original structure before the results are returned, not sure what's best.

Simon

Closing this due to inactivity (and it belongs in https://github.com/freesasa/freesasa-python).