medianIndex/quantileIndex doesn’t handle missing data
mbostock opened this issue · 3 comments
d3.medianIndex returns one less than the index of the median element. For example:
penguins[d3.medianIndex(penguins, (d) => d.body_mass_g)].body_mass_g // 3600
penguins[d3.medianIndex(penguins, (d) => d.body_mass_g) + 1].body_mass_g // 4050
I don’t think this is the intended behavior, even if we were trying to chose the “lower” element in the case where the array length is even. I expect it to return the index of the median value instead.
d3.median(penguins, (d) => d.body_mass_g) // 4050
It also seems to returning the last index if there are multiple elements with the median value?
{
const sorted = d3.sort(penguins, (d) => d.body_mass_g);
const i = d3.medianIndex(sorted, (d) => d.body_mass_g);
return sorted[i - 5].body_mass_g; // 4050
}
Ah ha. The problem is that we’re dropping undefined values in the initial filter:
Line 36 in c62f825
This means that the indexes are indexes into the filtered array, and hence are wrong. If we pre-filter the data the correct result is returned:
{
const filtered = penguins.filter((d) => d.body_mass_g);
const i = d3.medianIndex(filtered, (d) => d.body_mass_g);
return filtered[i];
}
Although, we are still returning the last index of equivalent values (it appears), and I think it would be preferable to return the first. But, any of those would be correct as long as we document it.
ouch!
const a = [1, 1, 1, 2, 2, undefined, 2, 3];
a[d3.medianIndex(a)]; // undefined