panphlan_profiling.py: allow user to specify input files
Closed this issue · 0 comments
For panphlan_profiling.py
, the input (--i_dna
) is designated by directory, and then panphlan_profiling.py
automatically finds the panphlan_map.py
output csv files. However, it appears that panphlan_profiling.py
just looks for files ending in *.bz2
, so if any other files are located in the panphlan_map.py
output directory (eg., the initial PANGENOME.tar.bz2
file downloaded via panphlan_download_pangenome.py
but is not deleted after uncompression), panphlan_profiling.py
dies with an error like the following:
$ panphlan_profiling.py --i_dna Eubacterium_rectale --pangenome Eubacterium_rectale/Eubacterium_rectale_pangenome.tsv --o_matrix out_matrix --verbose
STEP 1. Processing genes informations from pangenome file...
Number of reference genomes: 15
Average number of gene-families per genome: 3042
Total number of pangenome gene-families 11069
STEP 2. Create coverage matrix
[I] Reading mapping result file: Eubacterium_rectale.tar.bz2
Traceback (most recent call last):
File "/ebio/abt3_projects/software/dev/ll_pipelines/llmgps/.snakemake/conda/15b5bc2e/bin/panphlan_profiling.py", line 763, in <module>
main()
File "/ebio/abt3_projects/software/dev/ll_pipelines/llmgps/.snakemake/conda/15b5bc2e/bin/panphlan_profiling.py", line 709, in main
dna_samples_covs = read_map_results(args.i_dna, args.verbose)
File "/ebio/abt3_projects/software/dev/ll_pipelines/llmgps/.snakemake/conda/15b5bc2e/bin/panphlan_profiling.py", line 286, in read_map_results
dna_samples_covs[dna_sample_id] = read_gene_cov_file(os.path.join(i_dna, dna_covs_file))
File "/ebio/abt3_projects/software/dev/ll_pipelines/llmgps/.snakemake/conda/15b5bc2e/bin/panphlan_profiling.py", line 274, in read_gene_cov_file
gene, coverage = words[0], int(words[1])
IndexError: list index out of range
It would be helpful to allow the user to provide a list of input files via a text file. An alternative approach of allowing users to provide a comma-separated list of file paths via a CLI parameter can be problematic, given that long file paths can lead to commands that are too long (eg., if processing 100's or 1000's of samples).