SegataLab/panphlan

IndexError: list index out of range

GaioTransposon opened this issue · 3 comments

Hi guys,

Is this message familiar?

STEP 4: Define strain-specific gene-families presence/absence (1,-1,-2,-3 matrix, option --o_idx)
[W] No DNA 1,2,3 index file has been written because no strain was detected.

STEP 5: Get presence/absence of gene-families (1,-1 matrix, option --o_matrix)
Traceback (most recent call last):
File "/shared/homes/12705859/miniconda3/envs/panphlan_env/bin/panphlan_profiling.py", line 763, in
main()
File "/shared/homes/12705859/miniconda3/envs/panphlan_env/bin/panphlan_profiling.py", line 738, in main
sample2family2presence = get_genefamily_presence_absence(sample2family2dnaidx, sample_stats, avg_genome_length, args)
File "/shared/homes/12705859/miniconda3/envs/panphlan_env/bin/panphlan_profiling.py", line 574, in get_genefamily_presence_absence
families = sample2family2dnaidx[dna_samples[0]].keys()
IndexError: list index out of range

my command was:
panphlan_profiling.py -p /shared/homes/12705859/panphlan/Blautia_wexlerae/Blautia_wexlerae_pangenome.tsv -i map/ --o_matrix ./matrix_out/profile_Blautia_wexlerae --min_coverage 1 --left_max 1.70 --right_min 0.30

I have run panphlan_profiling.py before and never had this problem. Only difference is the way I downloaded the pangenomes (via panphlan_download earlier, while now via browser). Mapping worked fine. Files have content (from the size, over 20Mb each), which is why I wonder why no strain was detected

Thank you
Dany

Hello,

The warning in step 4 tells you that no sample in your input passed the thresholds you provided (--min_coverage 1 --left_max 1.70 --right_min 0.30 ). That does not specifically mean that you species is absent in the sample, but either that PanPhlAn limits are kind of reached. You could lower again the coverage thresholds (like --min_coverage 0.9 --left_max 2 --right_min 0.10 for example), but then your profiling results should be analyzed with care.

Also, keep in mind that depending on the species and the sample, a significant part could not pass the PanPhlAn analysis. I sometimes have more than half of the samples mapped that do not pass profiling.

Out of curiosity how many samples do you have in your input folder ?

Hi !

Yes indeed, it's better to concatenate all the reads. Otherwise it could prevent a sample to pass the min_coverage threshold thus discarding the sample