Is the penalized B-spline aware of errorbars in the data?

Question

Is the penalized B-spline aware of errorbars in the data?

exook opened this issue 8 years ago · 9 comments

Hi,

I am currently using this library to fit some scientific data with a P-Spline. I am having some problems with "Zero values" where the spline is too flexible and fits the zero values in my data and breaks the spline. Is there any way to make the spline aware of the error bars of the data or is the spline already fitting according to data points and their errors?

Best regards,
Alex

Answer 1 · 2017-04-28T20:06:47.000Z

Hi Alex,

The P-spline is fitted to the data points only - I am not sure what you mean by "error bars". Could you please elaborate.

Without more information I would have to guess that what your are experiencing is a case of overfitting. Maybe you can try to increase the alpha value for increased regularization?

Kind regards,
Bjarne

Answer 2 · 2017-04-28T20:42:24.000Z

Hi, thank you for your answer. After further investigation I see that the p-spline does not break at zero values so there is nothing wrong with the current implementation of the spline.

With error bars I mean the uncertainty of a data point from a measurement. For my research I had to be certain that there was no feature yet that implemented uncertainties into the p-spline.

Do you think that it is possible to implement a p-spline that takes measurement uncertainties into account? I'm thinking if the p-spline is asked to fit (within a set of data points) the x,y point (4,6). If this point has a y-uncertanty of +/-0.5 then the segment of the p-spline around that point would not be penalized further if the spline is within (4,5.5) and (4,6.5).

Thank you for this great library,
Alex

P.S. is there any way to increase the number of knots in a p-spline? And in that way increase the resolution of the spline. I'm noticing that for low alpha the spline is very jagged even though in the y-direction it is not close to a data point.

Answer 3 · 2017-04-28T21:51:10.000Z

Hi, thanks for clearing that up. What you are looking for is called weighted least squares regression. This will allow you to specify the weight to put on each sample point. These weights are often specified as the reciprocal of the variance (higher variance/uncertainty gives lower weight). I have made an issue (#80) to include this in the next release. If you subscribe to it you will be notified when it has been completed.

If you use the EQUIDISTANT knot vector you can specify the number of knots by setting numBasisFunctions. Let us know if this works for you.

Bjarne

Answer 4 · 2017-04-29T11:12:59.000Z

Weighted least squares is now available in the multidim-control-points branch. I have added a Python example which displays the use case.

Answer 5 · 2017-04-29T17:49:26.000Z

Wow, that was quick! Thank you so much, this will be a great addition to my field of research!

Answer 6 · 2017-04-29T20:14:28.000Z

Regarding the resolution of the spline. When I use:
"knot_spacing=KnotSpacing.AS_SAMPLED, num_basis_functions=int(1e6)"
It works as it should.

But when I use:
"knot_spacing=KnotSpacing.EQUIDISTANT, num_basis_functions=int(1e6)"
The spline goes out of control.

And using num_basis_functions=int(1e100) or other values doesn't change it

Answer 7 · 2017-04-30T08:48:46.000Z

Hi,

The P-spline seems to be behaving as expected when I try to replicate your case.

A few hints that may help you when setting the knot vectors:

AS_SAMPLED will create a spline with a knot vector that has a knot spacing similar to the spacing of the samples. The number of knots will be such that the number of basis functions equals the number of samples. Thus, setting num_basis_functions has no effect when using this setting.
EQUIDISTANT_KNOTS will create a spline from a knot vector of equidistant knots. The number of knots is set so that the number of basis functions equal the parameter num_basis_functions.
Based on your graphs I would advise that you set num_basis_functions to a number in the range 10-20. I think that would be sufficient to capture the shape of your data - certainly, the numbers you have tried are unnecessarily high and will increase the probability of overfitting.
The alpha value should be adjusted based on the number of samples and basis functions. You may try to scale the alpha value as follows to make it less sensitive to the other parameters: new_alpha = (num_basis_functions / num_samples) * alpha.

In short: num_basis_functions only works for EQUIDISTANT_KNOTS; num_basis_functions should be reduced; the regularization parameter alpha should be adjusted (preferably using K-fold cross validation).

It is on our TODO list to improve the documentation and rework the BSpline::Builder to make the construction of knot vectors clearer to the user.

Hope this helps.

Answer 8 · 2017-05-01T05:32:40.000Z

Hi again,

Thank you so much for all the help you have been giving me. I am currently working to wrap up my Bachelor's thesis in physics at Lund University, Sweden (due 7th of May), where I implement your Splinter library. If I have understood it correctly, the mathematics of splines can be implemented using different methods. Since I have mostly used wikipedia and different lecture slides to understand splines, I therefore wonder if you would have time to read my section on splines and p-splines (1.5 pages) to check that I am not completely lying.

I have emailed you concerning this on a "itk.ntnu.no" email address, but it is very likely that it ended up in a spam folder or that this email is inactive. ( If you have time you can reach me on pekman@uci.edu)

Best regards,
Alex

Answer 9 · 2017-05-01T14:58:37.000Z

I have sent you my comments per e-mail.

Best of luck on your thesis!