I ran swarm and clustering with d = 13 (yes, very large, I know. I am trying a few parameters others have used. I am working with animal COI metabarcoding data with high intra-specific variability). Went smooth.

I would like to plot the 3rd OTU. I am running the following command (adjusted version from your paper's supplementary material Supp1.

python -s statistics.txt -i internal.txt -o 3

The statistics.txt and the internal.txt files are the files that have been created for the -s and -i parameters when performing the initial clustering step with swarm. I am leaving out the -d parameters for now. As I can see, it defaults to zero when not defined.

However, I get this error message:

python -s statistics.stats -i internal.struct -o 3
Error: OTU does not exists or contains only one element.
Reading target OTU
Parsing amplicon relationships

Why does the 3rd OTU not exist? I have more than 9,000 OTUs.

This is how my statistics.txt file looks (exemplary for the first rows). Ignore the bold font in the first row.

20 481765 ASV1 362738 0 3 36
16 386950 ASV2 210150 0 1 12
168 476890 ASV3 176472 0 5 55
11 145906 ASV4 143517 0 1 6
35 244657 ASV6 101936 0 2 20
7 187436 ASV7 88833 0 1 13

This is how my internal.txt file looks like (exemplary for the first rows). Ignore the bold font in the first row.

ASV1 ASV20 2 1 1
ASV1 ASV41 1 1 1
ASV1 ASV71 1 1 1
ASV1 ASV79 1 1 1
ASV1 ASV477 1 1 1
ASV1 ASV1299 1 1 1
ASV1 ASV1985 2 1 1

I would appreciate your help. The -s and -i output files are written by swarm based on your algorithm, so I don't see why my OTU isn't found. The same problem occured when I told swarm to write the files as .stats and .struct files as in your code from the supplementary material.

@naurasd thank you for trying swarm.

python -s statistics.txt -i internal.txt -o 3

Yes, --internal_structure internal.txt corresponds to swarm --internal-structure internal.txt, but --swarms swarms.txt corresponds to swarm --output swarms.txt (i.e. swarm's default output), not to swarm --statistics-file stats.txt.

Yes, --internal_structure internal.txt corresponds to swarm --internal-structure internal.txt, but --swarms swarms.txt corresponds to swarm --output swarms.txt (i.e. swarm's default output), not to swarm --statistics-file stats.txt.

I realize now that the mixed option names are confusing (-s for graph_plot and -o for swarm). Sorry about that.

hi @frederic-mahe

thanks for getting back to me about this. Will try again then.

However, seeing that you mention the plotting option in your paper and how I got it wrong trying to understand the procedure from the supplementary material, I think it would be necessary to add an explanation with an example to the github repository.


Thanks for the suggestion.

I've added an example to the help message of the script (commit f3a7c87).