drivenbyentropy/aptasuite

How to choose the right clustering parameters?

CTPAHHIK38RUS opened this issue · 1 comments

How do the parameters affect the final clustering picture?
As I understand it:
The LSH Dimension is the upper bound of the substitutions we accept between the seed sequence and another cluster sequence.
Edit Distance - lower bound.
Iteration - the number of iterations of combining sequences into baskets, the more the better, but also increases the calculation time.
K-mer is the size of the "words" into which our sequences will be divided and then compared between each other (as in BLAST).
K-mer Cutoff - I didn't quite understand what it was.

Or the LSH Dimension is the size of the "window" of sequences by which we compare them with each other.
And Edit Distance - the number of substitutions we accept in this window?