Some problem about panaroo's result
Bin-Ma opened this issue · 6 comments
Hi @pansapiens,
I used the panaroo to construct the bacterial pangenome. If I select the '--codons' parameter, it will go wrong with the genes can't divisible by 3. Then, I removed the parameter and tried to aligned with codon by myself. However, the length of aligned gene generated by panaroo was not a multiple of 3.
I have no idea about how to solve this problem. Looking forward to your reply!
Bin
Hello!
Thanks for getting in touch about this, glad to hear that the codon alignment module is getting some use!
This is definitely not expected behaviour, as there are checks in the codon alignment code which should identify when DNA sequences are not alignable at the protein level, and default back to nucleotide-level alignment.
Would it be possible for you to share the error message panaroo returns when you've been running it with the --codons flag?
Thanks!
Thanks for your rapid response, @nzmacalasdair !
The following is the error message with the command panaroo -i ./gff/* -o panaroo_result_test -t 40 --clean-mode strict -a pan --remove-invalid-gene --codons
. By the way, my panaroo version is 1.3.4.
Looking forward to your reply!
Bin
Hello,
Thanks for sharing this error message, while there may also be a problem here with nucleotide sequences not divisible by 3, this looks to me like an issue with the biopython version, as Bio.Alphabet was deprecated in biopython 1.78, and you're getting an Alphabet error.
Could you please check the biopython version in the conda environment you are using to run panaroo? I see that it's an old roary environment, so it seems possible that the biopython may be out of date.
If it's <1.7.8, try updating biopython and running the alignment again -- you can run the alignment step separately with panaroo-msa, or panaroo-msa-runner.py, to avoid having to repeat the pan-genome inference step of panaroo.
It might also be good to try a fresh install of panaroo in a clean conda environment.
Hi @nzmacalasdair !
Thank you very much for your professional suggestion. Now, it works successfully.
Besides, there are other two error message occurred in other datasets as follow. The command is nohup panaroo -i ./gff/* -o panaroo_result -t 60 --clean-mode strict -a pan --remove-invalid-gene &
.
First error message (72 bacterial genome):
Second error message (166 bacterial genome):
Thanks again for your kindly help. Looking forward to your reply.
Bin
Hi,
I've never seen errors like this before, and it is difficult to for me to definitively diagnose these errors without the input data, as I think there is probably something unusual going on there. I'm not sure theses two errors are actually are the same issue, either, but here are some ideas:
- An issue with the
--remove-invalid-gene
flag. Could you run it without this flag (particularly on the 166 genomes) and let me know what issues you encounter (if any?) - Something unexpected in your input data. In the case of the first error message, it almost looks like panaroo could not find any core genes -- are the 72 bacterial genomes very distantly related? In the second case, it looks like there is an issue with the 639th gene on the first contig in isolate 23 -- you can check the
gene_data.csv
file to see is there is anything unusual about this gene, the geneid in gene_data should be 23_0_640. Are the 72 and 166 isolate sets completed disjoint, ie there are no overlapping isolates between them? - Are all the other dependencies up to date in the conda environment you are trying to run panaroo in? It might particularly be worth checking if numpy is up to date.
This may have been caused by collisions occurring with temporary files being created in the TMPDIR system directory by python and the package gffutils. This has been addressed in v1.4.
I'll close this issue for now and re-open it if someone observes the same problem in the latest version.