Questions regarding tsebra
bauerlev opened this issue · 1 comments
Hello, I have a few questions regarding tsebra. I know it has it's own github, but it doesn't seem to be active and your group is the same maker so I'm hoping this is an appropriate place to ask.
- What do each of these parameters in the config file actually mean? I can't find an answer in the documentation anywhere.
Allowed difference for each feature
Values have to be in [0,2]
e_1 0.0
e_2 0.5
e_3 0.096
e_4 0.02
e_5 0.18
e_6 0.18
- Is there documentation on how the script get_longest_isoform works? We noticed multiple transcripts for a given loci after attempting to run braker3, so I tried this script and while it helped there are still instances where there's more than one transcript for a given locus.
Thanks for your help! We've been having significantly better success with braker over maker and I'm very grateful.
Hello @bauerlev,
Hope you may find this helpful;
"Hi, I will upload a TSEBRA version with the keep-all option by the end of this week.
Your command line looks correct and it should work.
You might be correct that the configuration of the long-read version of TSEBRA isn't fitted for all species as the amount of long-read data available during development was very limited. If you want to adjust the configuration, I would suggest that you try different values for intron_support, e_1, e_4, e_5, e_6.
The support values in the config file specify the minimum fraction that has to be supported by extrinsic evidence. If a transcript has lower evidence support in start/stop-codon and intron, it will be filtered out. For the current long read configuration, this means that all transcripts must have either all introns or their stop supported. I would suggest decreasing intron_support if you want to change anything here. This can be especially helpful if you think that the sensitivity at the gene level is not high enough.
The e parameter are thresholds that are used to allow some difference between the different scores of two transcripts at the same locus. In short, the thresholds correspond to scores as follows: e_1: relative fraction of supported introns, e_2: relative fraction of supported stop-codons, e_3 relative fraction of supported start-codons, e_4: absolute intron support, e_5: absolute stop-codon support, e_6 absolute start-codon support. If you want to go more in-depth, you can take a look at our paper. I would try to increase e_1, e_4, e_5, e_6, especially if you want to keep more alternative isoforms per gene."