rm2gff3 converts RepeatMasker's .out
output file (must include the original 3 lines header) onto a gff3 annotation file.
For IGV users, the script also adds colors to the main TE classes. This feature requires the following:
- The script considers TE headers to be on the Repeat Masker format --> NAME#Order/Superfamily
- Orders:LINE, SINE, DNA, LTR, RC, Low_complexity, Satellite and Simple_repeat will be colored
- All others Classes/Order will be colored in light grey
- Colors can be changed using HTML RGB syntax e.g.: #3399ff
- The script will parse any .out file regarding the format of the TE header, beware that in that case the coloring may be inconsistent
List of colors (can be changed in script):
- LINE (blue, #3399ff)
- SINE (purple, #800080)
- DNA (salmon ,#ff6666)
- LTR (green, #00cc44)
- RC (orange, #ff6600)
- Low_complexity (grey/blue, #d1d1e0)
- Satellite (pink, #ff99ff)
- Simple_repeat (dark grey/blue, #8686ac)
- Unknown (grey, #f2f2f2)
rm2gff3 is a simple shell script (bash) using awk to convert the .out into .gff3
./rm2gff3.sh input.repeatmasker.out > output.gff3
rm2gff3 can be piped but mind the 1 line header containing ##gff-version3
.
Please report bugs and comments in the issue sections.
Enjoy!