Entity list possibly incomplete
jgreener64 opened this issue · 4 comments
jgreener64 commented
I tested the Biopython MMTF parser on the whole PDB and ran into errors on a few files. I think this is an issue on the mmtf side but am not sure. The files are: 1j6t, 1o2f, 1ts6, 1vrc, 2g10 and 2k9y.
For example fetch("1o2f")
works without an error. But fetch("1o2f").entity_list
gives
[{'chainIndexList': [0, 2, 5],
'description': 'PTS SYSTEM, MANNITOL-SPECIFIC IIABC COMPONENT',
'sequence': 'MANLFKLGAENIFLGRKAATKEEAIRFAGEQLVKGGYVEPEYVQAMLDREKLTPTYLGESIAVPHGTVEAKDRVLKTGVVFCQYPEGVRFGEEEDDIARLVIGIAARNNEHIQVITSLTNALDDESVIERLAHTTSVDEVLELLAGRK',
'type': 'polymer'},
{'chainIndexList': [1, 3, 6],
'description': 'Phosphocarrier protein HPr',
'sequence': 'MFQQEVTITAPNGLHTRPAAQFVKEAKGFTSEITVTSNGKSASAKSLFKLQTLGLTQGTVVTISAEGEDEQKAVEHLVKLMAELE',
'type': 'polymer'}]
and fetch("1o2f").chain_name_list
gives
[u'A', u'B', u'A', u'B', u'B', u'A', u'B', u'B']
So it appears the entity list is not complete as it is missing two of the chains. In the file these correspond to phosphate residues present only in Models 2 and 3. This leads to key errors in the chain_index_to_type_map
function called by Biopython.
pwrose commented
I looked at the entity information in the MMTF files and indeed for example
in 1o2f the third entity is missing (3 non-polymer syn 'PHOSPHITE ION' ).
Models should be homogeneous, i.e., they should contain the same number of
entities in each model.
However, there are a few legacy structures like this one that doesn't
follow the standards.
We use BioJava to convert .cif file to .mmtf files. We need to investigate
how BioJava deals with this situation and see how to fix it.
…On Thu, Jul 27, 2017 at 6:48 AM, Joe Greener ***@***.***> wrote:
I tested the Biopython MMTF parser on the whole PDB and ran into errors on
a few files. I think this is an issue on the mmtf side but am not sure. The
files are: 1j6t, 1o2f, 1ts6, 1vrc, 2g10 and 2k9y.
For example fetch("1o2f") works without an error. But
fetch("1o2f").entity_list gives
[{'chainIndexList': [0, 2, 5],
'description': 'PTS SYSTEM, MANNITOL-SPECIFIC IIABC COMPONENT',
'sequence': 'MANLFKLGAENIFLGRKAATKEEAIRFAGEQLVKGGYVEPEYVQAMLDREKLTPTYLGESIAVPHGTVEAKDRVLKTGVVFCQYPEGVRFGEEEDDIARLVIGIAARNNEHIQVITSLTNALDDESVIERLAHTTSVDEVLELLAGRK',
'type': 'polymer'},
{'chainIndexList': [1, 3, 6],
'description': 'Phosphocarrier protein HPr',
'sequence': 'MFQQEVTITAPNGLHTRPAAQFVKEAKGFTSEITVTSNGKSASAKSLFKLQTLGLTQGTVVTISAEGEDEQKAVEHLVKLMAELE',
'type': 'polymer'}]
and fetch("1o2f").chain_name_list gives
[u'A', u'B', u'A', u'B', u'B', u'A', u'B', u'B']
So it appears the entity list is not complete as it is missing two of the
chains. In the file these correspond to phosphate residues present only in
Models 2 and 3. This leads to key errors in the chain_index_to_type_map
function called by Biopython.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#28>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADuwEOkojTsDBFBWvDCiy6WNX0y2_mSfks5sSJU1gaJpZM4OlTxk>
.
--
Peter Rose, Ph.D.
Director, Structural Bioinformatics Laboratory
San Diego Supercomputer Center
UC San Diego
+1-858-822-5497
pwrose commented
I've filed an issue in BioJava:
biojava/biojava#696
…On Wed, Aug 2, 2017 at 4:26 PM, Peter Rose ***@***.***> wrote:
I looked at the entity information in the MMTF files and indeed for
example in 1o2f the third entity is missing (3 non-polymer syn 'PHOSPHITE
ION' ).
Models should be homogeneous, i.e., they should contain the same number of
entities in each model.
However, there are a few legacy structures like this one that doesn't
follow the standards.
We use BioJava to convert .cif file to .mmtf files. We need to investigate
how BioJava deals with this situation and see how to fix it.
On Thu, Jul 27, 2017 at 6:48 AM, Joe Greener ***@***.***>
wrote:
> I tested the Biopython MMTF parser on the whole PDB and ran into errors
> on a few files. I think this is an issue on the mmtf side but am not sure.
> The files are: 1j6t, 1o2f, 1ts6, 1vrc, 2g10 and 2k9y.
>
> For example fetch("1o2f") works without an error. But
> fetch("1o2f").entity_list gives
>
> [{'chainIndexList': [0, 2, 5],
> 'description': 'PTS SYSTEM, MANNITOL-SPECIFIC IIABC COMPONENT',
> 'sequence': 'MANLFKLGAENIFLGRKAATKEEAIRFAGEQLVKGGYVEPEYVQAMLDREKLTPTYLGESIAVPHGTVEAKDRVLKTGVVFCQYPEGVRFGEEEDDIARLVIGIAARNNEHIQVITSLTNALDDESVIERLAHTTSVDEVLELLAGRK',
> 'type': 'polymer'},
> {'chainIndexList': [1, 3, 6],
> 'description': 'Phosphocarrier protein HPr',
> 'sequence': 'MFQQEVTITAPNGLHTRPAAQFVKEAKGFTSEITVTSNGKSASAKSLFKLQTLGLTQGTVVTISAEGEDEQKAVEHLVKLMAELE',
> 'type': 'polymer'}]
>
> and fetch("1o2f").chain_name_list gives
>
> [u'A', u'B', u'A', u'B', u'B', u'A', u'B', u'B']
>
> So it appears the entity list is not complete as it is missing two of the
> chains. In the file these correspond to phosphate residues present only in
> Models 2 and 3. This leads to key errors in the chain_index_to_type_map
> function called by Biopython.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#28>, or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ADuwEOkojTsDBFBWvDCiy6WNX0y2_mSfks5sSJU1gaJpZM4OlTxk>
> .
>
--
Peter Rose, Ph.D.
Director, Structural Bioinformatics Laboratory
San Diego Supercomputer Center
UC San Diego
+1-858-822-5497 <(858)%20822-5497>
--
Peter Rose, Ph.D.
Director, Structural Bioinformatics Laboratory
San Diego Supercomputer Center
UC San Diego
+1-858-822-5497
jgreener64 commented
Great thanks!
abradle commented
I'm going to close this for the time being. As question has moved to BioJava. Can reopen later on though.