rcsb/mmtf-python

Entity list possibly incomplete

jgreener64 opened this issue · 4 comments

I tested the Biopython MMTF parser on the whole PDB and ran into errors on a few files. I think this is an issue on the mmtf side but am not sure. The files are: 1j6t, 1o2f, 1ts6, 1vrc, 2g10 and 2k9y.

For example fetch("1o2f") works without an error. But fetch("1o2f").entity_list gives

[{'chainIndexList': [0, 2, 5],
  'description': 'PTS SYSTEM, MANNITOL-SPECIFIC IIABC COMPONENT',
  'sequence': 'MANLFKLGAENIFLGRKAATKEEAIRFAGEQLVKGGYVEPEYVQAMLDREKLTPTYLGESIAVPHGTVEAKDRVLKTGVVFCQYPEGVRFGEEEDDIARLVIGIAARNNEHIQVITSLTNALDDESVIERLAHTTSVDEVLELLAGRK',
  'type': 'polymer'},
 {'chainIndexList': [1, 3, 6],
  'description': 'Phosphocarrier protein HPr',
  'sequence': 'MFQQEVTITAPNGLHTRPAAQFVKEAKGFTSEITVTSNGKSASAKSLFKLQTLGLTQGTVVTISAEGEDEQKAVEHLVKLMAELE',
  'type': 'polymer'}]

and fetch("1o2f").chain_name_list gives

[u'A', u'B', u'A', u'B', u'B', u'A', u'B', u'B']

So it appears the entity list is not complete as it is missing two of the chains. In the file these correspond to phosphate residues present only in Models 2 and 3. This leads to key errors in the chain_index_to_type_map function called by Biopython.

Great thanks!

I'm going to close this for the time being. As question has moved to BioJava. Can reopen later on though.