lh3/miniprot

Using miniprot to identify homologous transposable elements?

Opened this issue · 0 comments

It just comes to me that miniprot might be used to identify homologous TEs (based on evolutionary distant TE protein sequence) because many TEs have conserved domains (and structures) although evolved much faster than typical protein coding genes.

Because non-active (ancient) TEs can have frameshift, internal stopcodon, and insertions (from other TEs), which are similar to introns, I set -j to 0 to ignore splicing sites. However, active (more recent) TEs are usually without introns, but miniprot penalises single exons to avoid pseudogenes.

I wonder what setting should I use to avoid the single-exon penalties? And any suggestion for this task with miniprot?