riatelab/statsbreaks

Count by class is wrong when input array contains values that are not number / non finite number

Opened this issue · 1 comments

mthh commented

Consider the following code:

let discr = require("statsbreaks")

let data = [1, 1, 2, 2, 3, 3, 3, 3, 4, 5, 5, 5, 6, 7, 8, 'foo', -Infinity, NaN]
let series = new discr.JenksClassifier(data, 2);
let bks = series.classify(3);
let count = series.countByClass();

I think count should be [8, 5, 2] (as if we used [1, 1, 2, 2, 3, 3, 3, 3, 4, 5, 5, 5, 6, 7, 8] as input array) instead of [9, 5, 2, NaN].

The breaks returned are correct (because the input array is filtered in the inner classification function) but in Classifier classes we store the input array before it is filtered :

this._values = values;

A quick fix is simply to store the filtered input array in the line of code shown below (but we'll be redoing this filtering for nothing in the internal classification function).

A better fix might be to avoid doing this filtering twice (and to avoid creating too many new arrays, since doing array.filter(/* some code */).map(/* some code */) creates two new arrays). However, in most cases this shouldn't make any noticeable difference to performance.

mthh commented

Interested by a fix ? Any preference between my two options ?