What is meant by the number of "sampled suffix positions"?
krinsman opened this issue · 0 comments
Referring specifically to the -k
option of the mummer
command.
I searched on Google, as well as in the complete Mummer documentation at
https://github.com/mummer4/mummer/blob/master/docs/maxmat3src.pdf
The only place I could find the term "sampled suffix position" (or even "sampled position" or "suffix position") used was in the help output for the -k
option.
Line 378 in 6c0da41
Apparently the -k
option is valid only for -maxmatch
, but it is not clear why
Line 231 in 6c0da41
Also apparently the -k
option has something to do with "sparseness", but again it is not clear why
Line 301 in 6c0da41
Also why the -threads
option is only valid for when k > 1
is not clear. This in turn makes it even more difficult to understand why there is both a -threads
and a -qthreads
option.
I would have just ignored the existence of the option, except that the default examples given in the help both use a non-default option for k
.
Line 391 in 6c0da41
Line 397 in 6c0da41
Here is the most basic question one could ask which isn't clear to me based on the documentation:
- does a higher value of k lead to increased computational expense but better/more accurate results?
- or does a higher value of k lead to decreased computational expense but worse/less accurate results?
The fact that multi-threading is only an option when k > 1
suggests the former. At the same time, since it apparently has something to do with "sparseness", it also seems plausible that k > 1
would lead to worse output.
Any improvements to the documentation would be greatly appreciated and would make me feel more confident recommending Mummer to colleagues.