zaeleus/noodles

gtf parse error

Closed this issue · 1 comments

where read NCBI GRCh37_latest_genomic.gtf.gz, panic with:
Error: Custom { kind: InvalidData, error: InvalidRecord(InvalidAttributes(InvalidEntry(Empty))) };

first record in GRCh37_latest_genomic.gtf.gz:
NC_000001.10^IBestRefSeq^Igene^I11874^I14409^I.^I+^I.^Igene_id "DDX11L1"; transcript_id ""; db_xref "GeneID:100287102"; db_xref "HGNC:HGNC:37102"; description "DEAD/H-box helicase 11 like 1 (pseudogene)"; gbkey "Gene"; gene "DDX11L1"; gene_biotype "transcribed_pseudogene"; pseudo "true"; $

There is a space at the end of line。

Thanks for reporting.

I suspect this is a serialization error in the GTF file, but handling such a case is undefined. To accommodate it, I changed the parser to trim trailing whitespace in 2385606.

Note that the annotations are also available as GFF3 (GRCh37_latest_genomic.gff.gz in refseq_identifiers). I highly recommend favoring GFF3, as GTF/GFF2 is deprecated.