SeqID does not end with a number.
tbazilegith opened this issue · 6 comments
Hello,
I ran gff3_sort using the command below and got the error that follows
gff3_sort --gff_file mysample_results20220802/annot.gff --output_gff mysample_sort.gff3
ERROR [SeqID] SeqID does not end with a number.
- Line 6: 1 Local region 1 3396752 . + . ID=1:1..3396752;Dbxref=taxon:1386;Is_circular=true;Name=ANONYMOUS;gbkey=Src;genome=chromosome;mol_type=genomic DNA;strain=replaceme
Adding argument -r like " gff3_sort -g example_file/example.gff3 -og example-sorted.gff3 -r " can handle this situation.
I went ahead and added the flag -r
gff3_sort --gff_file mysample_results20220802/annot.gff --output_gff mysample_sort.gff3 -r
But I got this
Traceback (most recent call last):
File "/apps/gff3toolkit/2.0.3/bin/gff3_sort", line 8, in
sys.exit(script_main())
File "/apps/gff3toolkit/2.0.3/lib/python3.9/site-packages/gff3tool/bin/gff3_sort.py", line 437, in script_main
main(args.gff_file, output=args.output_gff, isoform_sort=args.isoform_sort, sorting_order=sorting_order, logger=logger_stderr, reference=args.reference)
File "/apps/gff3toolkit/2.0.3/lib/python3.9/site-packages/gff3tool/bin/gff3_sort.py", line 223, in main
sequence_regions[sequence_region['seqid']] = (sequence_region['start'], sequence_region['end'])
KeyError: 'end'
It seems to me that the above "Line 6" must be skipped in the file annot.gff
Any thought on that?
Thanks,
TJ
@tbazilegith this error looks similar to the one reported in #125. Can you post some examples of the sequence directive lines? Do they all have a number as the end coordinate?
Hello MPoelchau,
Here is what I have
##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
##sequence-region 1 3396752
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=1386
1 Local region 1 3396752 . + . ID=1:1..3396752;Dbxref=taxon:1386;Is_circular=true;Name=ANONYMOUS;gbkey=Src;genome=chromosome;mol_type=genomic DNA;strain=replaceme
1 . pseudogene 1 144 . - . ID=gene-tmp_000001;Name=tmp_000001;gbkey=Gene;gene_biotype=pseudogene;locus_tag=tmp_000001;pseudo=true
Thanks,
TJ
Hello MPoelchau,
Here is the full header
##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
##sequence-region 1 3396752
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=1386
1 Local region 1 3396752 . + . ID=1:1..3396752;Dbxref=taxon:1386;Is_circular=true;Name=ANONYMOUS;gbkey=Src;genome=chromosome;mol_type=genomic DNA;strain=replaceme
1 . pseudogene 1 144 . - . ID=gene-tmp_000001;Name=tmp_000001;gbkey=Gene;gene_biotype=pseudogene;locus_tag=tmp_000001;pseudo=true
Thanks,
TJ
@tbazilegith looks like the sequence region directive is missing a '1' (representing either the chromosome or the start coordinate). The format is ##sequence-region seqid start end. So it should instead be
##sequence-region 1 1 3396752
@tbazilegith just following up, did fixing the sequence region directive work for you?
I'll close this issue but feel free to re-open if that didn't help.