IndexError: list index out of range
GaioTransposon opened this issue · 3 comments
Hi guys,
Is this message familiar?
STEP 4: Define strain-specific gene-families presence/absence (1,-1,-2,-3 matrix, option --o_idx)
[W] No DNA 1,2,3 index file has been written because no strain was detected.
STEP 5: Get presence/absence of gene-families (1,-1 matrix, option --o_matrix)
Traceback (most recent call last):
File "/shared/homes/12705859/miniconda3/envs/panphlan_env/bin/panphlan_profiling.py", line 763, in
main()
File "/shared/homes/12705859/miniconda3/envs/panphlan_env/bin/panphlan_profiling.py", line 738, in main
sample2family2presence = get_genefamily_presence_absence(sample2family2dnaidx, sample_stats, avg_genome_length, args)
File "/shared/homes/12705859/miniconda3/envs/panphlan_env/bin/panphlan_profiling.py", line 574, in get_genefamily_presence_absence
families = sample2family2dnaidx[dna_samples[0]].keys()
IndexError: list index out of range
my command was:
panphlan_profiling.py -p /shared/homes/12705859/panphlan/Blautia_wexlerae/Blautia_wexlerae_pangenome.tsv -i map/ --o_matrix ./matrix_out/profile_Blautia_wexlerae --min_coverage 1 --left_max 1.70 --right_min 0.30
I have run panphlan_profiling.py before and never had this problem. Only difference is the way I downloaded the pangenomes (via panphlan_download earlier, while now via browser). Mapping worked fine. Files have content (from the size, over 20Mb each), which is why I wonder why no strain was detected
Thank you
Dany
Hello,
The warning in step 4 tells you that no sample in your input passed the thresholds you provided (--min_coverage 1 --left_max 1.70 --right_min 0.30
). That does not specifically mean that you species is absent in the sample, but either that PanPhlAn limits are kind of reached. You could lower again the coverage thresholds (like --min_coverage 0.9 --left_max 2 --right_min 0.10
for example), but then your profiling results should be analyzed with care.
Also, keep in mind that depending on the species and the sample, a significant part could not pass the PanPhlAn analysis. I sometimes have more than half of the samples mapped that do not pass profiling.
Out of curiosity how many samples do you have in your input folder ?
Hi !
Yes indeed, it's better to concatenate all the reads. Otherwise it could prevent a sample to pass the min_coverage
threshold thus discarding the sample