torognes/swarm

Visualize swarms for d>1

Closed this issue · 5 comments

I'm using visualizations of a few target taxa to explore the differences between d=1 and d-2. graph_plot.py seems to be working fine for d=2, but I am unclear as to how the graphics are defined.

From Swarm V2 (2015)
"Edges in these networks only represent the parameter d used; the length of the edges carries no information. The nodes in the networks represent amplicons."

It seems that the term amplicons is used to described unique sequences. If this is the case, it seems that either each node in the network must represent binning of multiple amplicons (each either d=2 or d=1 from their nearest neighbor node), or the edges in the network must represent both distances of 2 and distances of 1.

I haven't been able to parse out what's happenign from the script (I'm not very familiar with Python!) but I'm hoping you can explain what's happening. Thanks!

Hi @steph43,

yes amplicon means unique sequence. Another sequence with at least 1 difference will be represented by another node in the graphical representation. When using d = 2, edges can represent either distances of 1 or 2, without any visual distinction.

Let's assume you have only one isolated OTU, with the same number of amplicons whatever the d value used for clustering. All graphical representations (d = 1, d = 2, d = 3, etc) will have the same number of nodes, but the number of segmented paths will decrease as d increases (I hope I did not make things less clear).

I am going to close that issue. Feel free to re-open it if my answer does not cover completely the initial question.

Hi!
I have a question regarding this matter. When I run swarm in fastidious mode that means that my -d will be automatically 1 (so it will look for at least one SNP between reads, correct?), however, when I visualize the clusters using graph_plot.py and I adjust my -d 3, does that mean that the clusters (could I call them haplotypes?) visualized will be the only ones with at least 3 SNPs or indels? Also, when I have my central OTU with a bunch of amplicons marked (say 500), and I find a node away from it with no number, does that mean that node has only 1 amplicon different from the others? In this case would I have 2 haplotypes/clusters? or just one that is away from the central OTU?

Thanks!

Swarm's default is to link amplicons with a single difference (one insertion or deletion or substitution), that's -d 1. With the fastidious option, swarm will allow a double difference (one insertion or deletion or substitution, twice) to link low-abundant amplicons to the closest cluster, assuming that intermediate amplicons were not observed for stochastic reasons.

If you produce visualization plots for clusters obtained with the -d 1 fastidious option, then amplicons connected during the fastidious phase will be "floating" (no edge) around. This is because the python script does not take into account edges representing more than one step, i.e. more than d (amplicons linked during the fastidious phase have 2 differences with the amplicon they are connected to).

If you produce visualization plots for clusters obtained with -d 2 or more, the edges in the graph will represent at most d differences (could be 1, 2, ... up to d).

Great! Thanks a lot! That clarifies it! (: