KULL-Centre/PRISM

[domain_protein_features] pdb not assigned to correct domain

Opened this issue · 3 comments

For some cases the uniprot_end and uniprot_start for a given pdb will not span the same sequence as the domain they are filed under.
image

This is due to the protein having domains spanning different sequences with identical names:
http://pfam.xfam.org/protein/P06654#tabview=tab0

I'm slightly confused as for what exactly the problem is - domains often occur in repeats, that would not be a bug. But perhaps you mean something else?

image

So for example if there is a pdb of domain 222-284, and in the output file uniprot start and end would then be 227-282 and assigned to that domain.
However if there is another domain with identical name from 291-354, the domain assignment will be overwritten. The pdb with uniprot 227-282 would then suddenly be grouped with the domain 291-354 instead of the 222-284 domain. Examples of this can be seen in the table above

oh, if it's overwritten that is of course a problem, the domain name is not enough as an index