d3/d3-array

medianIndex/quantileIndex doesn’t handle missing data

mbostock opened this issue · 3 comments

d3.medianIndex returns one less than the index of the median element. For example:

penguins[d3.medianIndex(penguins, (d) => d.body_mass_g)].body_mass_g // 3600
penguins[d3.medianIndex(penguins, (d) => d.body_mass_g) + 1].body_mass_g // 4050

I don’t think this is the intended behavior, even if we were trying to chose the “lower” element in the case where the array length is even. I expect it to return the index of the median value instead.

d3.median(penguins, (d) => d.body_mass_g) // 4050

Previously #140 #159.

It also seems to returning the last index if there are multiple elements with the median value?

{
  const sorted = d3.sort(penguins, (d) => d.body_mass_g);
  const i = d3.medianIndex(sorted, (d) => d.body_mass_g);
  return sorted[i - 5].body_mass_g; // 4050
}

Ah ha. The problem is that we’re dropping undefined values in the initial filter:

values = Float64Array.from(numbers(values, valueof));

This means that the indexes are indexes into the filtered array, and hence are wrong. If we pre-filter the data the correct result is returned:

{
  const filtered = penguins.filter((d) => d.body_mass_g);
  const i = d3.medianIndex(filtered, (d) => d.body_mass_g);
  return filtered[i];
}

Although, we are still returning the last index of equivalent values (it appears), and I think it would be preferable to return the first. But, any of those would be correct as long as we document it.

Fil commented

ouch!

  const a = [1, 1, 1, 2, 2, undefined, 2, 3];
  a[d3.medianIndex(a)]; // undefined