"Population must be a sequence. For dicts or sets, use sorted(d). in line 83" [Python 3.11 compatibility?]
prototaxites opened this issue · 4 comments
Trying to run CAMISIM to generate a very small test metagenome data set with a mixture of eukaryotic and prokaryotic genomes to test a pipeline with, following the example in the usage guide. I am getting the following error:
2023-02-27 10:28:52 INFO: [MetagenomeSimulationPipeline] Metagenome simulation starting
2023-02-27 10:28:52 INFO: [MetagenomeSimulationPipeline] Validating Genomes
2023-02-27 10:28:52 INFO: [MetadataReader] Reading file: '/nfshome/store04/users/b.jmd20jns/camisim/genome_to_id.tsv'
2023-02-27 10:28:53 INFO: [MetagenomeSimulationPipeline] Design Communities
2023-02-27 10:28:53 INFO: [CommunityDesign] Drawing strains.
2023-02-27 10:28:53 INFO: [MetadataReader 31395689975] Reading file: '/nfshome/store04/users/b.jmd20jns/camisim/metadata.tsv'
2023-02-27 10:28:53 ERROR: [MetagenomeSimulationPipeline] Population must be a sequence. For dicts or sets, use sorted(d). in line 83
2023-02-27 10:28:53 INFO: [MetagenomeSimulationPipeline] Metagenome simulation aborted
Any idea what's going on and how to fix it?
metadata.tsv:
genome_ID OTU NCBI_ID novelty_category
Pseudomicrostroma_glucosiphilum 1 1684307 known_strain
Aureobasidium_pullulans 2 5580 known_strain
Anaeromicropila_populeti 3 37658 known_strain
Bacillus_subtilis 4 1423 known_strain
Erwinia_billingiae 5 182337 known_strain
Frondihabitans_PhB188 6 2485200 known_strain
Pseudarthrobacter_scleromae 7 158897 known_strain
Pseudomonas_fluorescens 8 294 known_strain
Variovorax_boronicumulans 9 436515 known_strain
genome_to_id.tsv
Pseudomicrostroma_glucosiphilum genomes/GCA_003144135.1_Rhodsp1_genomic.fna
Aureobasidium_pullulans genomes/GCA_000721785.1_Aureobasidium_pullulans_var._pullulans_EXF-150_assembly_version_1.0_genomic.fna
Anaeromicropila_populeti genomes/GCA_900112775.1_IMG-taxon_2599185221_annotated_assembly_genomic.fna
Bacillus_subtilis genomes/GCA_000009045.1_ASM904v1_genomic.fna
Erwinia_billingiae genomes/GCA_000196615.1_ASM19661v1_genomic.fna
Frondihabitans_PhB188 genomes/GCA_003752365.1_ASM375236v1_genomic.fna
Pseudarthrobacter_scleromae genomes/GCA_014644515.1_ASM1464451v1_genomic.fna
Pseudomonas_fluorescens genomes/GCA_900215245.1_IMG-taxon_2617270901_annotated_assembly_genomic.fna
Variovorax_boronicumulans genomes/GCA_009811375.1_ASM981137v1_genomic.fna
Hey, thanks for bringing this to my attention. Are you by any chance using python>=3.11? Python 3.11 removed the automatic conversion of sets to lists as population of random samples and there is one instance of CAMISIM using the keys of a dict for random sampling.
For compatibility with Python 3.11 there are two changes which need to be performed for CAMISIM to run:
- In
scripts/configparserwrapper.py
line 5:from collections import Iterable
needs to be changed tofrom collections.abc import Iterable
(since CAMISIM does not run without that change I assume you already did this?) - In
scripts/StrainSelector/strainselector.py
line 253:for otu_id in random.sample(self._otu_list.keys(), len(self._otu_list)):
tofor otu_id in random.sample(list(self._otu_list.keys()), len(self._otu_list)):
making the conversion explicit.
After this, CAMISIM runs on my end. I have not pushed these changes since I want to check that it keeps everything else intact and to ensure backward compatibility, but it should let you run CAMISIM.
If you are not using Python 3.11 then I am sorry and will have to check things again, in the meantime I changed the title so other people using it can find the solution in this Issue.
Hey, thanks for the very quick reply! Yes, I was using Python 3.11 (though I'm currently spinning up a 3.9 conda environment). I did figure out the first change but not the second - I'll see how I get on with the 3.9 environment in the first instance, but if that fails I'll give the above a go.
Hi, Python 3.9 did the trick! For anyone else stumbling across this, the following conda environment works to run Camisim quite happily:
conda create -n camisim python=3.9 perl matplotlib-base numpy biopython biom-format scikit-learn configparser ete3 perl-xml-simple
Glad that it works, we tested CAMISIM mainly on Python 3.7. I hope that most of these environment and version problems will be solved once we move to CAMISIM2.0 (coming soon™)