Is it necessary to shield repetitive sequences?
Closed this issue ยท 1 comments
XXH123a commented
Hi, Professor Li, excuse me
- should there be no difference between soft shielding and unshielded miniprot?
- if hard masking is used, some CDs sequences extracted from GFF files contain about 10 to 100 N, how should I deal with these CDs sequences containing N?
- how should I screen the extracted CDs sequences whether or not the repetitive sequences are shielded? For example, if the protein length is less than 50 amino acids, discard all or other standards?
The following is a CdS sequence extracted from the GFF file by miniprot annotation using proteins of the same species. The length of the protein translated by seqkit and gffreed also made me a little confused?
lh3 commented
Repeat masking will affect the alignment of some proteins as you showed. I don't know whether that is a positive or negative effect overall. You have to do a research by yourself.