quickselect’s k is a misnomer.

Question

quickselect’s k is a misnomer.

mbostock opened this issue 6 years ago · 5 comments

quickselect(array, 0) selects the top 1 element, quickselect(array, 2) selects the top 3 elements, and so on. This makes the name k highly confusing because it normally refers to the number of elements you are selecting:

https://en.wikipedia.org/wiki/Selection_algorithm
https://en.wikipedia.org/wiki/Floyd–Rivest_algorithm

Also quickselect’s right index is inclusive, not exclusive like the other d3-array APIs, such as bisect’s lo and hi.

Option 1. Don’t change the functionality, but avoid the name “k” entirely and use one of “middle”, “mid” or “pivot”.

Option 2. Release d3-array@3, make “k” behave the way everyone expects (e.g., quickselect(array, 1) selects the top-1), rename “left” to “lo”, rename “right” to “hi” and make it exclusive.

Answer 1 · 2019-04-30T09:30:15.000Z

I'm fine with both options. Re: option 2, I'm all for not breaking things, but I don't think it would be a major problem if quickselect's API changed. There are no real uses that I can find in public blocks or obs. notebooks, and the main usage is internal. (In that case, the sooner the better.)

Answer 2 · 2019-05-13T09:08:11.000Z

If we go with no change:

update README to read pivot instead of k.
update forthcoming documentation in https://observablehq.com/d/a631067e30b51c0c accordingly.

@mourner would you also like to change the notation upstream?

Answer 3 · 2019-05-13T13:46:12.000Z

This makes the name k highly confusing because it normally refers to the number of elements you are selecting:

I'm not sure that's the case — in Wikipedia, k is referred to as an index, not a number of elements:

selection algorithm is an algorithm for finding the kth smallest number in a list or array; such a number is called the kth order statistic

The same goes for an equivalent C++ function, nth_element:

The element placed in the nth position is exactly the element that would occur in this position if the range was fully sorted.

The Wikipedia one may be confusing because it uses 1-based index to describe the algorithm, while programmers are used to 0-based index.

I agree on right being exclusive in other implementations though — looks like this was a mistake on my part.

Answer 4 · 2019-05-13T14:50:19.000Z

Yes, the Wikipedia articles treat k as an index; apologies for misreading. I think I’ve just been so engrained to treat k as a count of elements, as in k nearest-neighbors or “select the top k elements”, that I was greatly surprised to see it mean an index instead.

Answer 5 · 2020-05-29T15:55:49.000Z

https://observablehq.com/@d3/d3-quickselect mentions k as the pivot index.