labgem/PPanGGOLiN

Reading the gbff file error

seajane opened this issue · 4 comments

Hello! I am trying to create a new pangenome analysis and am en countering an error during the initial file read. I get the following error: "Exception: Reading the gbff file 'Gsp_SRRXXX.gbff' raised an error. Genetic code should be an integer". The header for this datafile is ver simple - it looks like this:

LOCUS scaffold00000001 243110 bp DNA UNK 17-MAY-2024
DEFINITION CAXXXX010000001.1 MAG TPA_asm: uncultured isolate
SRRXXX_bin.10_metaWRAP_v1.3_MAG genome assembly, contig:
ERZ13382858.13, whole genome shotgun sequence.
ACCESSION CAXXXX010000001
VERSION CAXXXX010000001.1
KEYWORDS .
SOURCE KBase_KBase
ORGANISM spp.
COMMENT Genome imported from RAST annotationRenamed contig from
CAXXXX010000001.1 because the original name exceeded 16 characters
FEATURES Location/Qualifiers
gene 350..1606
/product="Serine/threonine:Na+ symporter SstT"
/translation="MHKIVQKWNNIDLILRIVIGLFLGAALGIFVPAHVYVLDLMGKLF
VSALKSVAPILVFCLVIG...

Is there some way to fix my metadata or your program so that it defaults to genetic code 11?

I should note that other files don't have a genetic code specified and import just fine. I am using version 2.0.5.

This is the full error:

"Traceback (most recent call last):
File "/Users/hbouzek/opt/anaconda3/envs/ppgg-d/lib/python3.10/site-packages/ppanggolin/annotate/annotate.py", line 435, in read_anno_file
return read_org_gbff(organism_name, filename, circular_contigs, pseudo)
File "/Users/hbouzek/opt/anaconda3/envs/ppgg-d/lib/python3.10/site-packages/ppanggolin/annotate/annotate.py", line 165, in read_org_gbff
create_gene(organism, contig, gene_counter, rna_counter, locus_tag, dbxref, start, stop, strand,
File "/Users/hbouzek/opt/anaconda3/envs/ppgg-d/lib/python3.10/site-packages/ppanggolin/annotate/annotate.py", line 85, in create_gene
new_gene.fill_annotations(start=start, stop=stop, strand=strand, gene_type=gene_type, name=gene_name,
File "/Users/hbouzek/opt/anaconda3/envs/ppgg-d/lib/python3.10/site-packages/ppanggolin/genome.py", line 299, in fill_annotations
raise TypeError("Genetic code should be an integer")
TypeError: Genetic code should be an integer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/hbouzek/opt/anaconda3/envs/ppgg-d/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/Users/hbouzek/opt/anaconda3/envs/ppgg-d/lib/python3.10/site-packages/ppanggolin/annotate/annotate.py", line 437, in read_anno_file
raise Exception(f"Reading the gbff file '{filename}' raised an error. {err}")
Exception: Reading the gbff file 'Gsp_SRRXXX.gbff' raised an error. Genetic code should be an integer
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/hbouzek/opt/anaconda3/envs/ppgg-d/bin/ppanggolin", line 8, in
sys.exit(main())
File "/Users/hbouzek/opt/anaconda3/envs/ppgg-d/lib/python3.10/site-packages/ppanggolin/main.py", line 177, in main
ppanggolin.annotate.launch(args)
File "/Users/hbouzek/opt/anaconda3/envs/ppgg-d/lib/python3.10/site-packages/ppanggolin/annotate/annotate.py", line 666, in launch
read_annotations(pangenome, args.anno, cpu=args.cpu, pseudo=args.use_pseudo, disable_bar=args.disable_prog_bar)
File "/Users/hbouzek/opt/anaconda3/envs/ppgg-d/lib/python3.10/site-packages/ppanggolin/annotate/annotate.py", line 527, in read_annotations
org, flag = future.result()
File "/Users/hbouzek/opt/anaconda3/envs/ppgg-d/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/Users/hbouzek/opt/anaconda3/envs/ppgg-d/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
Exception: Reading the gbff file 'Gsp_SRRXXX.gbff' raised an error. Genetic code should be an integer"

I also get the error "Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions." when running ppanggolin on my Mac, but this hasn't caused any other errors previously.

Hi,

I see, I believe this is related to the lack of "/transl_table" field in the gbff, if it's the case this should be easy to fix.

By "other files import just fine" do you mean gff and fasta, or other gbff ?

For the MAC warning, I've never seen this, but I don't have access to a "newer" MAC. do you have a MAC with an ARM processor?

Adelme

Hello again!

Thank you so much for your quick response. I also believe that the issue is the missing "/trans_table" in the files that aren't working (the others are also gbff). I am using a newer M1 Mac.

Heather

Hi,

Alright, I'll fix this at some point in the coming days by setting a proper default if missing, thank you for reporting this.

We're planning to get a bugfix release out soon so it should be in the next release in the coming weeks.

Glad to know that ppanggolin works on MAC with ARM processors, I was not actually sure!

Adelme

The fix for this issue has been released in v2.1.0.