tanghaibao/goatools

Allow arbitrary letters in Aspect column

serenalotreck opened this issue · 2 comments

I'm trying to use GafReader to read in GAF files for the Planteome database. These files are GAF 2.0 compliant with one exception -- in the "Aspect" column (NS in the GafReader.associations named tuple), there are letters besides the P, F, and C that are specified in the GAF docs.

As a result, when I run the tool, I get the following error:

Traceback (most recent call last):
  File "/mnt/home/lotrecks/anaconda3/envs/dygiepp/lib/python3.7/site-packages/goatools/anno/init/reader_gaf.py", line 92, in _read_gaf_nts
    self._add_data0(nts, lnum, line, get_all_nss, namespaces, datobj)
  File "/mnt/home/lotrecks/anaconda3/envs/dygiepp/lib/python3.7/site-packages/goatools/anno/init/reader_gaf.py", line 114, in _add_data0
    nspc = GafData.aspect2ns[flds[8]]  # 8 GAF Aspect -> BP, MF, or CC
KeyError: 'T'

  **FATAL-gaf: 'T'

**FATAL-gaf: /mnt/scratch/lotrecks/planteome_attempt1/to_gene_Oryza_Gramene.assoc[3]:
GR_gene	GR:0060141	CL		TO:0000089	GR_REF:1793	IMP		T	CLUSTERED SPIKELETS	Cl|Clustered spikelets|Cl|Clustered spikelets	gene	taxon:4530	20121108	Gramene		

 0) REQ DB                   GR_gene
 1) REQ DB_ID                GR:0060141
 2) REQ DB_Symbol            CL
 3)     Qualifier            
 4) REQ GO_ID                TO:0000089
 5) REQ DB_Reference         GR_REF:1793
 6) REQ Evidence_Code        IMP
 7)     With_From            
 8) REQ NS                   T
 9)     DB_Name              CLUSTERED SPIKELETS
10)     DB_Synonym           Cl|Clustered spikelets|Cl|Clustered spikelets
11) REQ DB_Type              gene
12) REQ Taxon                taxon:4530
13) REQ Date                 20121108
14) REQ Assigned_By          Gramene
15)     Extension            
16)     Gene_Product_Form_ID 

Would it be possible to allow the NS field to have any letter? I'm trying to avoid writing my own GAF parser since you already have what seems to be a fairly robust one, but I'm not sure how else to get around this issue.

Thanks!

Thanks for the terrific contribution.

I added functionality for an issue similar to yours to the obo reader for issue 202 (accepting ontologies like the Human Phenotype Ontology (HPO))

I exposed some annotation file issues exposed upon running the regression tests prior to pushing (
geneontology/helpdesk#358 and geneontology/helpdesk#359)

After I resolve the test issues, I will push the new functionality for issue 202.

Thank you again for your interest in GOA TOOLs, for taking the time to write us, and for the terrific code contribution.

Thanks so much for the quick response & merge!