
Picks the best clustering % identity to use for a given organism/species, taking into account (in order of importance): (a) That higher % ids will produce better alignments (so set a minimum % threshold of 70%); (b) that each cluster needs to be 'alignable' (i.e. set the max size of any given cluster to 100 seqs; unless the % id is very high, e.g. => 90%, in which case the sequences are very similar so the max size of any cluster can be raised to, say, 500 seqs, and still produce a good alignment); (c) that we want to used the largest sample size of sequences possible (i.e. the minimum number of unclustered seqs and singletons, provided the first two conditions are satisfied).

Primary LanguageGo
