There are several ways to extract intron feature gff3 from gene_exon gff3 file. We can use GBrowse databases to dump intron based gff3 file as a first option.
Load the gff3 file into MySQL:
perl bp_bulk_load.pl -u [uname]-p [pass] -d [gbrowse_database] [input.gff3/input.fasta]
Extract intron feature gff3:
perl make_intron_feature.pl -u [uname]-p [pass] -db [gbrowse_database] -o [output.gff3]
Here are the final results.
This is an alternative solution without using GBrowse and MySQL. First we need to download and install the latest version of misopy and gffutils. Then use the following code.
python extract_intron_gff3_from_gff3.py [input.gff3] [output.gff3]
Finally filter and sort the output gff3 file
awk '/intron/{print}' output.gff3 | sort -k 1,1 -k4,2n > processed_intron.gff3
If you don't like to type commands, you can use PlantGenIE Galaxy extract intron feature tool.
Before:
Chr01 phytozome8_0 gene 2906 6646 . - . ID=Potri.001G000200;Name=Potri.001G000200 Chr01 phytozome8_0 mRNA 2906 6646 . - . ID=PAC:27045395;Name=Potri.001G000200.1; Chr01 phytozome8_0 exon 6501 6646 . - . ID=PAC:27045395.exon.1;Parent=PAC:27045395; Chr01 phytozome8_0 CDS 6501 6644 . - 0 ID=PAC:27045395.CDS.1;Parent=PAC:27045395; Chr01 phytozome8_0 five_prime_UTR 6645 6646 . - . ID=PAC:27045395.five_prime_UTR.1; Chr01 phytozome8_0 exon 3506 3928 . - . ID=PAC:27045395.exon.2;Parent=PAC:27045395; Chr01 phytozome8_0 CDS 3506 3928 . - 0 ID=PAC:27045395.CDS.2;Parent=PAC:27045395; Chr01 phytozome8_0 exon 2906 3475 . - . ID=PAC:27045395.exon.3;Parent=PAC:27045395;
After:
Chr01 phytozome8_0 intron 3476 3505 . . . ID=Potri.001G000200;Parent=PAC:27045395 Chr01 phytozome8_0 intron 3929 6500 . . . ID=Potri.001G000200;Parent=PAC:27045395
Final results similar to this
Here we use the output from above steps(processed_intron/output.gff3).
perl exttract_seq_from_gff3.pl -d genome.fa - gene_intron.gff3 > output_intron.fa
Test results are available here.