Usage of with eggnog-mapper2
Lucas-Maciel opened this issue · 10 comments
- eggnog2gbk version: 0.0.7
- Python version: 3.8.2
- Operating System: CentOS Linux 7
Description
Hi, I'm trying to use your tool with my output from eggnog-mapper v2
What I Did
I used your test data and it worked, but not with mine.
emapper2gbk genomic -fg ../Roseburia_inulinivorans_DSM16841/GCF_000174195.1_ASM17419v1_cds_from_genomic.fna -fp ../Roseburia_inulinivorans_DSM16841/GCF_000174195.1_ASM17419v1_protein.faa -o teste.out -a Roseburia_inulinivorans_DSM16841.emapper.annotations
The default organism name 'cellular organisms' is used.
Formatting fasta and annotation file for GCF_000174195.1_ASM17419v1_genomic
Traceback (most recent call last):
File "/raeslab/scratch/lucmac/miniconda3/bin/emapper2gbk", line 8, in <module>
sys.exit(cli())
File "/raeslab/scratch/lucmac/miniconda3/lib/python3.8/site-packages/emapper2gbk/__main__.py", line 245, in cli
gbk_creation(genome=args.fastagenome, proteome=args.fastaprot, annot=args.annotation, gff=args.gff, org=orgnames, gbk=args.out, gobasic=args.gobasic, dirmode=directory_mode, cpu=args.cpu, metagenomic_mode=False)
File "/raeslab/scratch/lucmac/miniconda3/lib/python3.8/site-packages/emapper2gbk/emapper2gbk.py", line 32, in gbk_creation
fa_to_gbk.main(genome, proteome, annot, org, gbk, gobasic)
File "/raeslab/scratch/lucmac/miniconda3/lib/python3.8/site-packages/emapper2gbk/fa_to_gbk.py", line 170, in main
faa_to_gbk(genome_fasta, prot_fasta, annot_table, species_name, gbk_out, gobasic)
File "/raeslab/scratch/lucmac/miniconda3/lib/python3.8/site-packages/emapper2gbk/fa_to_gbk.py", line 64, in faa_to_gbk
annotation_data = dict(read_annotation(annotation_data))
File "/raeslab/scratch/lucmac/miniconda3/lib/python3.8/site-packages/emapper2gbk/utils.py", line 269, in read_annotation
annotation_data.columns = headers_row
File "/home/lucmac/.local/lib/python3.8/site-packages/pandas/core/generic.py", line 5475, in __setattr__
return object.__setattr__(self, name, value)
File "pandas/_libs/properties.pyx", line 66, in pandas._libs.properties.AxisProperty.__set__
File "/home/lucmac/.local/lib/python3.8/site-packages/pandas/core/generic.py", line 669, in _set_axis
self._mgr.set_axis(axis, labels)
File "/home/lucmac/.local/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 220, in set_axis
raise ValueError(
ValueError: Length mismatch: Expected axis has 24 elements, new values have 1 elements
# Fri Feb 12 12:56:02 2021
# emapper-2.0.6
# emapper.py -i Roseburia_inulinivorans_DSM16841/GCF_000174195.1_ASM17419v1_protein.faa --cpu 4 --itype proteins -m diamond --output_dir eggnog --output Roseburia_inulinivorans_DSM16841
#
#query_name seed_eggNOG_ortholog seed_ortholog_evalue seed_ortholog_score eggNOG OGs narr_og_name narr_og_cat narr_og_desc best_og_name best_og_cat best_og_desc Preferred_name GOs EC KEGG_ko KEGG_Pathway KEGG_Module KEGG_Reaction KEGG_rclass BRITE KEGG_TC CAZy BiGG_Reaction PFAMs
Hi Lucas, sorry for the delay.
What you describe is likely to a bug brought by the changes since the latest release.
Unfortunately, we don't have the latest releases (2.0.x) of emapper installed on our servers yet, and it appears that the online version of eggnog-mapper does not have the latest release in production either.
Is there a way for you to run emapper on our test data and share the .emapper.annotations
file with us so we could fix the bug?
Hi @Lucas-Maciel,
I have push a commit on the genomic_update branch that should fix this issue:
https://github.com/AuReMe/emapper_to_gbk/tree/genomic_update
Can you test it?
Hello, I'm having the same issue. Was this ever resolved?
Actually I'm getting a different error too, seems to be something related to simplejson? I'm uploading my files now with the call and error message in a txt file.
Hi @kieft1bp-sys,
Hello, I'm having the same issue. Was this ever resolved?
This issue has been resolved in the genomic_update branch of emapper2gbk.
Actually I'm getting a different error too, seems to be something related to simplejson? I'm uploading my files now with the call and error message in a txt file.
Sorry for this error message, I am currently adding a more user friendly message in the new version.
This error is linked to the argument '-n "AB48"' in your command line. The '-n' argument expects a complete taxon name. For example you will get the same error if you put '-n "K-12"' instead of '-n "Escherichia coli K-12"'. But if you have no genus or species name, you can put a family name for example '-n "Enterobacteriaceae"'.
If you want to check if your taxon name is correct, you can check if using this http (which is the one used by emapper2gbk):
https://www.ebi.ac.uk/ena/taxonomy/rest/scientific-name/
For example with 'Escherichia coli' (and replacing ' ' by '%20'):
https://www.ebi.ac.uk/ena/taxonomy/rest/scientific-name/Escherichia%20coli
If it sends you a "No Results" message it is because either the taxon name is not in the database or there is a typo error in your taxon name.
Also it seems that you use a new version of eggnog-mapper (2.1.2) that changes the format of the output. So with the current version of emapper2gbk it will not work. I have pushed a new commit on the genomic_update branch that should fixed this issue.
But I think there will still be an issue: the nucleic fasta you provided contains gene sequences and not genome (chromosome) sequences so with the genome mode it will not work and the GFF file does not have a compatible format with the one expected in emapper2gbk (presented here).
If you use the genomic_update branch, you could obtain a genbank file with your "AB48.fna", "AB48.faa" and "AB48.emapper.annotations.tsv" by using the command:
emapper2gbk genes -fn AB48.fna -fp AB48.faa -a AB48.emapper.annotations.tsv -o AB48.gbk -n "Taxon name"
Thanks for the extensive answer! I'll try out your suggestions today.
I tried using the last command you suggested after installing the new branch and the program runs fine but does not bring in any annotations from the emapper annotations file (see attached .gbk).
Also, I modified my .gff file according to the format you linked to (see attached .gff) and tried running in "genomes" mode with my correct genome assembly .fna file (see attached .fna). It ran fine but produced an odd-looking gbk file (attached .gbk), so maybe my reformatting didn't help. (adding .txt to all file extensions because github needs it).
AB48_genomes_mode.gbk.txt
AB48_genome.fna.txt
AB48_updated.gff.txt
I tried using the last command you suggested after installing the new branch and the program runs fine but does not bring in any annotations from the emapper annotations file (see attached .gbk).
emapper2gbk will only extract GO Terms, EC number and gene name from the eggnog-mapper file. If genes have not these annotations, they will be not be annotated in the genbank. For example, the first 3 genes in the genbank file are not annotated because they have no GO Terms, EC numbers and gene name in the eggnog-mapper annotation file.
But if you move down in the file, you can see that the gene "contig_5_1000" is annotated. Or you can search in the file for "go_component", "gene", "go_function", "go_process" or "EC_number" to find annotations from eggnog-mapper.
Also, I modified my .gff file according to the format you linked to (see attached .gff) and tried running in "genomes" mode with my correct genome assembly .fna file (see attached .fna). It ran fine but produced an odd-looking gbk file (attached .gbk), so maybe my reformatting didn't help. (adding .txt to all file extensions because github needs it).
AB48_genomes_mode.gbk.txt
AB48_genome.fna.txt
AB48_updated.gff.txt
In this genbank, there is no annotation and no protein sequences associated to genes. I think it is because when you have updated the GFF file, the ID of the CDS does not match the ID in the "AB48.fna" and "AB48.emapper.annotations.tsv" files.
For example in the GFF file: "cds-contig_5_1" is the CDS ID for "contig_5_1". So emapper2gbk will search for the ID "cds-contig_5_1" in the "AB48.fna" and in the "AB48.emapper.annotations.tsv". But it will not find it as in these files it is still labelled "contig_5_1".
Updating both "AB48.fna" and "AB48.emapper.annotations.tsv" with the "cds-contig" ID should fix this.