gtonkinhill/panaroo

A limit in number of input files?

dianalaucw opened this issue · 2 comments

I planned to use panaroo on 2386 .gff files that is created by prokka from 2386 assembly files of 2386 strains. Each .gff files is in a separate directory with name {strain}_prokka. For instance, GU693__24742_1_80.gff is in directory GU693__24742_1_80_prokka.

I tried to run panaroo using this command panaroo -i /annotated_assembly/*prokka/*.gff -o /users/panaroo/ --clean-mode strict -t 16 with 16 cpus.
It gave me the following errors : /usr/bin/singularity: Argument list too long.

I wondered if there is a limit on size of input files. If so, would the approach of merging panaroo graphs works? For instance, we split the .gff files in some number of manageable parts and then use panaroo-merge to merge them. Yet, the examples of merge panaroo graphs only consist of two datasets(https://gtonkinhill.github.io/panaroo/#/merge/merge_graphs). Is it also workable on more than two? May I also know where is the documentation of panaroo-merge?

Hello!

Merging panaroo graphs is one solution that will work, but panaroo will also accept as input a single text file, where each line of the input file is the path to a supported GFF file. Doing this should be much quicker and easier and running it separately on subsets and merging!

Graph merging was designed for very large (>10^4 isolate) datasets, so it shouldn't be necessary in this case. There is no limit to the number of isolates which can be input into panaroo, but practical limitations (ie runitime) with very large or very diverse datasets mean that in those cases, it is better to run subsets of the data and then merge.

(For completeness sake) Yes, it is possible to merge as many datasets as you would like, by providing all of the output directories with the -d argument to panaroo-merge. I'm afraid that the webpage you found is currently all the documentation for panaroo-merge, we are working to improve this.

Let me know if you have any problems with the list data input.

Closing this as it is hopefully resolved! Let me know if not.