neherlab/pan-genome-analysis

Failed to read gbff files

oeco28 opened this issue · 0 comments

I have a question. Most genome assemblies from ncbi are now available as gbff files. I am running into an issue where 2,124 genomes of a bacteria (Strep pyogenes) fail to load. But I think this has to do with the format.
The error I get is:

Traceback (most recent call last):
File "/opt/apps/panx/1.6.0/pan-genome-analysis-master/panX.py", line 256, in
myPangenome.extract_gbk_sequences()
File "/opt/apps/panx/1.6.0/pan-genome-analysis-master/scripts/pangenome_computation.py", line 128, in extract_gbk_sequences
extract_sequences(self.path, self.strain_list, self.folders_dict, self.gbk_present, self.enable_RNA_clustering)
File "/opt/apps/panx/1.6.0/pan-genome-analysis-master/scripts/sf_extract_sequences.py", line 156, in extract_sequences
gene_aa_dict, gene_na_dict, RNA_dict, enable_RNA_clustering)
File "/opt/apps/panx/1.6.0/pan-genome-analysis-master/scripts/sf_extract_sequences.py", line 58, in gbk_translation
locus_tag=feature.qualifiers['db_xref'][0].split(':')[1]
KeyError: 'db_xref'