Count by class is wrong when input array contains values that are not number / non finite number
Opened this issue · 1 comments
Consider the following code:
let discr = require("statsbreaks")
let data = [1, 1, 2, 2, 3, 3, 3, 3, 4, 5, 5, 5, 6, 7, 8, 'foo', -Infinity, NaN]
let series = new discr.JenksClassifier(data, 2);
let bks = series.classify(3);
let count = series.countByClass();
I think count
should be [8, 5, 2]
(as if we used [1, 1, 2, 2, 3, 3, 3, 3, 4, 5, 5, 5, 6, 7, 8]
as input array) instead of [9, 5, 2, NaN]
.
The breaks returned are correct (because the input array is filtered in the inner classification function) but in Classifier classes we store the input array before it is filtered :
Line 25 in c016c68
A quick fix is simply to store the filtered input array in the line of code shown below (but we'll be redoing this filtering for nothing in the internal classification function).
A better fix might be to avoid doing this filtering twice (and to avoid creating too many new arrays, since doing array.filter(/* some code */).map(/* some code */)
creates two new arrays). However, in most cases this shouldn't make any noticeable difference to performance.
Interested by a fix ? Any preference between my two options ?