results that go missing when the protein sequence is made longer (or software version changes)
Opened this issue · 1 comments
I don't know if this is expected behavior (I hope not).
I have a 211aa protein sequence that is truncated -- i.e., does not end in a stop codon:
MQTQAFCGIIQIDGTFLFCIKKGNLLIIGTPAPNNRLIPIAFAWSVSENTITIKDMLTKLKSFIPPSRFKNIYSDQGPAIIAAVRESGFSCDHKFCLRHFATKREYINVYSEIVEVAYADHPQKRIDLIKKLETRLQEEYPNRENNQDLFKYLDSINPFEGFADYTAGILTTSLIESLNAEIKDKWDTYEPAELIIRLIEHEFNLVKNVLT
When I run it through interproscan-5.61-93 (as well as two earlier versions), it returns a PFAM MULE transposase domain in residues 9-101 (PF10551/IPR018289)
And from other contextual data I do expect this to be a possible MULE transposase.
When I extend the sequence to the next in-frame stop codon (i.e., completing it) it makes a 355aa sequence:
MQTQAFCGIIQIDGTFLFCIKKGNLLIIGTPAPNNRLIPIAFAWSVSENTITIKDMLTKLKSFIPPSRFKNIYSDQGPAIIAAVRESGFSCDHKFCLRHFATKREYINVYSEIVEVAYADHPQKRIDLIKKLETRLQEEYPNRENNQDLFKYLDSINPFEGFADYTAGILTTSLIESLNAEIKDKWDTYEPAELIIRLIEHEFNLVKNVLTGDFKSDNIIKNLNETLKHSDMFSSVLYDPIQELYYATFGRYTYCVKIMSDSQYSCTCKHIELYGLPCIHVIAVLNHFSNKNLLKNLNDAVHARFKCSEFMTPVEDLMKFYVDQASLKIPGINFNLGEIEKLRGKRTRIKAFYEK*
However, when this longer version is run through interproscan-5.61-93, it only returns hits to Zinc finger domains/profiles, for residues ~250-290 (i.e., entirely within the added sequences). There are no hits returned to other regions.
Moreover , I tested this with interproscan releases 5.55-88, 5.56-89, 5.61-93, and 5.64-96. The short vs long behavior is the same for 55,56, and 61; 64 doesn't return a MULE hit at all, even using the short sequence.
I explored the behavior further by extending the 211aa sequence in 25aa increments.
55 and 56: MULE @9-101 returned for increments up to 235aa, but not for >= 261aa, only Zn finger @250-290
61: MULE @9-101 returned for increments up to 285aa, but not for >=311 aa, only Zn finger @250-290
64: no MULE hits at any length, only Zn finger @250-290
(and worryingly, results are the same as 64 using the latest version 5.65-97 on the EBI web portal)
So the different results are varying both by sequence length, and by version of software.
PF10551/IPR018289 still exist in the Interpro database (they have not been deprecated) so it's not due to that.
Can you explain why this is happening?
Hi @krabapple. I am also having the same issue. Did you find an answer for this?