quickselect’s k is a misnomer.
mbostock opened this issue · 5 comments
quickselect(array, 0) selects the top 1 element, quickselect(array, 2) selects the top 3 elements, and so on. This makes the name k highly confusing because it normally refers to the number of elements you are selecting:
https://en.wikipedia.org/wiki/Selection_algorithm
https://en.wikipedia.org/wiki/Floyd–Rivest_algorithm
Also quickselect’s right index is inclusive, not exclusive like the other d3-array APIs, such as bisect’s lo and hi.
Option 1. Don’t change the functionality, but avoid the name “k” entirely and use one of “middle”, “mid” or “pivot”.
Option 2. Release d3-array@3, make “k” behave the way everyone expects (e.g., quickselect(array, 1) selects the top-1), rename “left” to “lo”, rename “right” to “hi” and make it exclusive.
I'm fine with both options. Re: option 2, I'm all for not breaking things, but I don't think it would be a major problem if quickselect's API changed. There are no real uses that I can find in public blocks or obs. notebooks, and the main usage is internal. (In that case, the sooner the better.)
If we go with no change:
- update README to read pivot instead of k.
- update forthcoming documentation in https://observablehq.com/d/a631067e30b51c0c accordingly.
@mourner would you also like to change the notation upstream?
This makes the name k highly confusing because it normally refers to the number of elements you are selecting:
I'm not sure that's the case — in Wikipedia, k
is referred to as an index, not a number of elements:
selection algorithm is an algorithm for finding the kth smallest number in a list or array; such a number is called the kth order statistic
The same goes for an equivalent C++ function, nth_element:
The element placed in the
nth
position is exactly the element that would occur in this position if the range was fully sorted.
The Wikipedia one may be confusing because it uses 1-based index to describe the algorithm, while programmers are used to 0-based index.
I agree on right
being exclusive in other implementations though — looks like this was a mistake on my part.
Yes, the Wikipedia articles treat k as an index; apologies for misreading. I think I’ve just been so engrained to treat k as a count of elements, as in k nearest-neighbors or “select the top k elements”, that I was greatly surprised to see it mean an index instead.
https://observablehq.com/@d3/d3-quickselect mentions k as the pivot index.