labgem/PPanGGOLiN

ValueError: The gene family has not beed associated to a partition.

frdel1 opened this issue ยท 7 comments

Hi,
I am experiencing the following error with ppanggolin all:
"ValueError: The gene family has not beed associated to a partition."

Steps to reproduce:

# get a bunch of genomes to create the pangenome
datasets download genome accession GCF_009935005.1 GCF_001015835.1 GCF_009933955.1 GCF_001932715.1 GCF_027863375.1 --include gbff

# create the organism.gbff.list file
# create the pangenome with ppanggolin all
conda activate ppanggolin-2.1.0
ppanggolin all --anno /path/to/organism.gbff.list --cpu 1 --identity 0.8 --output /path/to/output_ppanggolin_all

Best wishes

Hi !

Sorry to hear about that.
Could you launch your command again with the option --verbose 2 and share the results ?

Thanks

consol.out.txt

Sure, here it is.
Are you able to reproduce the bug by downloading the set of genomes specified in the datasets download genome accession command and running ppanggolin all ?

Hi!

Thanks for the output. As I suspected, you don't have enough genomes in your pangenome.
The partitioning method is based on the NEM algorithm, and to work with the default parameters, we suggest using at least 15 genomes. You can find more information about the PPanGGOLiN method in the publication here.

Yet all is not lost. First, add the -K 2 option to the' all' command. This option will force PPanGGOLiN to compute only two partitions.
Then, If it did not work, I could suggest following the step-by-step pangenome construction in the documentation (skip the workflow part), or if you kept your pangenome, you could directly use the command explained here to custom the partitioning. @ggautreau will be a greater help than me at this stage.

Hi!
Thanks for the explanation and the tips, I will follow your advice and use at least 15 genomes then.

Another tip, if you don't mind me saying so.
You can build a pangenome with all genomes of your species from RefSeq or GenBank, for example, and project the pangenome on your five genomes of interest as explained here.

Thanks ! I have tried ppanggolin projection already, good stuff :)

Hi,
We've changed the log to show a warning instead of a debug message when the partition step fails, making it easier to spot the problem in the version 2.1.1.
Ideally, PPanGGOLiN should still work even if partitioning fails, as mentioned in issue #270.