veg/hivtrace

Cluster diameter related features

Opened this issue · 0 comments

spond commented

It might be desirable to

  1. Report the diameter of any cluster, i.e. max (k, n in cluster) = distance (k, n).
  2. Permit, optionally, to reduce/split clusters so that a maximum diameter specified a priori is enforced. The strategy for how to enforce diameter maxima is not entirely clear, because multiple options appear feasible.

A general implementation would either require all pairwise TN93 distances, as opposed to only those that are ≤ threshold) to be available (not desirable due to file sizes), or the ability to recompute distances on the fly, which would entail subsequent calls to TN93 to compute distances between all sequences in a cluster.