Suboptimal folding misses some structures / Lonely pairs aren't optional
Closed this issue · 6 comments
Consider this suboptimal foldng interaction (all default params but with an infinite energy window):
> GGAAAACC
>> ........ 0
>> [[....]] 12
>> m[....]M 35
>> .[....]3 37
>> [......] 39
>> .[....]. 45
>> 5[....]. 45
Now, consider removing the outer G/C.
> GAAAAC
>> ...... 0
Previously, we (correctly) got ".[....]." This structure should still exist.
This is probably caused by the no lonely pairs heuristic
Which one is considered correct? Surely it should be consistent with the partition function and MFE folding.
It is correct to have ".[....]." as a valid structure for "GAAAAC". I think I have verified that this is caused by the no lonely pairs heuristic. It would be nice to have an option to turn this off.
The heuristic is implictly in the ViableFoldingPair function in fast_energy.h
I mean: Is there an agreement about whether the lonely pairs heuristic should be used or not? Is the agreement different between MFE, subopt, and partition?
It's a heuristic that makes MFE accuracy better but nobody knows why. It is claimed that they are not expected to occur in real structures, but there is no good source for this claim (or this heuristic). See https://doi.org/10.1186/1741-7007-4-5
I think the best solution is to optionally include it for feature pairity with RNAstructure and Vienna (which both use it). I prefer not to use the heuristic when using subopt or partition fn. Although Vienna and RNAstructure do by default.
EDIT actually it works! Also, the implementation of lonely pairs is incorrect in memerna. The "CanPair(gr[st + 1], gr[en - 1])" condition does not capture the case when st+1 and en-1 are too close to form a pair. Although, I am not sure if Vienna or RNAstructure are correct in this case either.
BTW, my PhD thesis has some good references on lonely pairs if you want to learn more. The TL;DR is that it's a folklore heuristic that seems (as far as I can tell) to exist because the M&T thermodynmic model overpredicts lonely pairs compared to real structures. Despite some claims in the literature, they DO exist in real structures, they are just rare.
Since there is no good principle behind it, I like to have the option to not use the heuristic.