biopragmatics/pyobo

Error while trying to write `ncbigene.obo` via pyobo

Opened this issue · 1 comments

pyobo call made via code:

from pyobo.sources.ncbigene import get_obo

def get_obo_file(output_path: str):
    """Get the obo file."""
    obo_file = get_obo()
    obo_file.write_obo(output_path, use_tqdm=True)

This resulted in the following error:

Traceback (most recent call last):
....
  /src/ncbi_gene_pyobo/cli.py", line 57, in get_obo_file
    obo_file.write_obo(output_path, use_tqdm=True)
  File "/opt/anaconda3/envs/ncbi-gene-pyobo/lib/python3.10/site-packages/pyobo/struct/struct.py", line 711, in write_obo
    self._write_lines(it, fh)
  File "/opt/anaconda3/envs/ncbi-gene-pyobo/lib/python3.10/site-packages/pyobo/struct/struct.py", line 717, in _write_lines
    for line in it:
  File "/opt/anaconda3/envs/ncbi-gene-pyobo/lib/python3.10/site-packages/pyobo/struct/struct.py", line 700, in iterate_obo_lines
    yield from term.iterate_obo_lines(ontology=self.ontology, typedefs=self.typedefs)
  File "/opt/anaconda3/envs/ncbi-gene-pyobo/lib/python3.10/site-packages/pyobo/struct/struct.py", line 429, in iterate_obo_lines
    yield f"def: {self._definition_fp()}"
  File "/opt/anaconda3/envs/ncbi-gene-pyobo/lib/python3.10/site-packages/pyobo/struct/struct.py", line 400, in _definition_fp
    return f'"{obo_escape_slim(self.definition)}" [{comma_separate(self.provenance)}]'
  File "/opt/anaconda3/envs/ncbi-gene-pyobo/lib/python3.10/site-packages/pyobo/struct/utils.py", line 24, in obo_escape_slim
    rv = "".join(OBO_ESCAPE_SLIM.get(character, character) for character in string)
TypeError: 'float' object is not iterable

The error points to line 24 below.

def obo_escape_slim(string: str) -> str:
"""Escape all funny characters for OBO."""
rv = "".join(OBO_ESCAPE_SLIM.get(character, character) for character in string)
rv = rv.replace("\n", "\\n")
return rv

Reran it by casting string in join() [str(string)]

    rv = "".join(OBO_ESCAPE_SLIM.get(character, character) for character in str(string))

fixed the issue.

stringifying a NAN from pandas isn't the right solution - this would introduce incorrect characters. This might have already been fixed in cc2f27b