d3/d3-array

quickselect’s k is a misnomer.

mbostock opened this issue · 5 comments

quickselect(array, 0) selects the top 1 element, quickselect(array, 2) selects the top 3 elements, and so on. This makes the name k highly confusing because it normally refers to the number of elements you are selecting:

https://en.wikipedia.org/wiki/Selection_algorithm
https://en.wikipedia.org/wiki/Floyd–Rivest_algorithm

Also quickselect’s right index is inclusive, not exclusive like the other d3-array APIs, such as bisect’s lo and hi.

Option 1. Don’t change the functionality, but avoid the name “k” entirely and use one of “middle”, “mid” or “pivot”.

Option 2. Release d3-array@3, make “k” behave the way everyone expects (e.g., quickselect(array, 1) selects the top-1), rename “left” to “lo”, rename “right” to “hi” and make it exclusive.

Fil commented

I'm fine with both options. Re: option 2, I'm all for not breaking things, but I don't think it would be a major problem if quickselect's API changed. There are no real uses that I can find in public blocks or obs. notebooks, and the main usage is internal. (In that case, the sooner the better.)

Fil commented

If we go with no change:

@mourner would you also like to change the notation upstream?

This makes the name k highly confusing because it normally refers to the number of elements you are selecting:

I'm not sure that's the case — in Wikipedia, k is referred to as an index, not a number of elements:

selection algorithm is an algorithm for finding the kth smallest number in a list or array; such a number is called the kth order statistic

The same goes for an equivalent C++ function, nth_element:

The element placed in the nth position is exactly the element that would occur in this position if the range was fully sorted.

The Wikipedia one may be confusing because it uses 1-based index to describe the algorithm, while programmers are used to 0-based index.

I agree on right being exclusive in other implementations though — looks like this was a mistake on my part.

Yes, the Wikipedia articles treat k as an index; apologies for misreading. I think I’ve just been so engrained to treat k as a count of elements, as in k nearest-neighbors or “select the top k elements”, that I was greatly surprised to see it mean an index instead.

Fil commented

https://observablehq.com/@d3/d3-quickselect mentions k as the pivot index.