HarryStevens/d3-regression

Superfluous code and robustness issues in loess.js?

jheer opened this issue · 1 comments

jheer commented

Thank you for this library! We're evaluating it for use within the Vega project. When reviewing the loess code I noticed this block:

finiteReal(xval);
finiteReal(yval);
finiteReal(weights);
strictlyIncreasing(xval);

These functions do not manipulate the data (no side effects) and each returns a boolean value that is never read. So as far as I can tell they have no effect whatsoever. (They seemed to have been copied more-or-less verbatim from science.js?) Should loess throw an error if the finite checks for x or y fail? I'm assuming so, though a (default?) option to drop invalid values might be nicer.

The weights vector is initialized to all ones in the previous step, so I don't see why that would need to be checked. Moreover, the weights array is never updated. It looks like the original implementation from science.js could accept an external weight vector; however, in this library (at the moment) the array is really just eating up some memory as the weights are only ever one.

I've also noticed degenerate output (including repeated x values in the output trend line) when using input data in which the x values are not strictly increasing. The current code checks for this, but the result of that check is ignored. While one could modify the code to raise an error, it would be nicer to have a loess method that permits multiple samples at the same x value.

One straightforward approach to the duplicated x-value issue might be to consolidate repeated points by taking the average y-value for each repeated x-value and setting the weight to the number of points in that "bin". However, I'd need to do some more work to suss out if this actually a reasonable thing to do from a statistical perspective. UPDATE: This approach would greatly affect the local regression neighborhood determination; an alternative approach is needed.

Hope this is helpful!

Thanks for taking a closer look at this and for the excellent pull request. As you correctly observed, my version was copied pretty much verbatim from science.js (except that it accepts an array of objects as its input and provides x- and y-accessors). All of your changes make sense to me, so I merged them unaltered. I also think it would make sense to use twoone of your ideas in all the other methods, namely: 1) clone the input array so it doesn't get modified, and 2) remove invalid input values. I'll be implementing thosethat shortly.